AD 


Award  Number:  DAMD17-98-1-8119 


TITLE:  Genetic  Damage  Caused  by  ALU  Repeats  in  Breast  Cancer 


PRINCIPAL  INVESTIGATOR:  Prescott  L.  Deininger,  Ph.D. 


CONTRACTING  ORGANIZATION:  Tulane  University  Medical  Center 

New  Orleans,  Louisiana  70112-2699 


REPORT  DATE:  August  2001 


TYPE  OF  REPORT:  Final 

PREPARED  FOR:  U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT:  Approved  for  Public  Release; 

Distribution  Unlimited 


The  views,  opinions  and/or  findings  contained  in  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation. 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
0MB  No.  074-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and  maintaining 
the  data  needed,  and  completing  and  reviewing  this  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggestions  for 
reducing  this  burden  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of 
ManaqelSent  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  DC  20503  _ 


1 .  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

August  2  0  01  Final  (1  Aug  98  -  31  Jul  01) _ 


4.  TITLE  AND  SUBTITLE  5.  FUNDING  NUMBERS 

Genetic  Damage  Caused  by  ALU  Repeats  in  Breast  DAMD 17-98-1-8 119 

Cancer 


6.  AUTHOR{S) 

Prescott  L.  Deininger,  Ph.D. 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Tulane  University  Medical  Center 
New  Orleans,  Louisiana  701 12-2699 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


E-Mail:  pdeinin@tulane.edu 


9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Maryland  21702-5012 


11.  SUPPLEMENTARY  NOTES 

Report  contains  color 


10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 


20020124  224 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 

Approved  for  Public  Release;  Distribution  Unlimited 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  {Maximum  200  Words) 

We  have  developed  a  series  of  allele-specific  PCR  amplification  procedures  that 
allow  us  to  amplify  the  flanking  sequences  from  the  most  recent  subfamilies  of  Alu 
elements  in  the  human  genome.  There  are  approximately  1000  elements  amplified  in  these 
experiments,  and  we  have  developed  several  strategies  for  amplifying  specific  subsets  of 
these  elements.  The  goal  is  to  identify  subsets  of  elements  that  can  be  amplified  and 
^displayed'  on  a  gel-based  or  subtractive  method  that  will  allow  us  to  detect  differences 
in  these  recent  elements  in  breast  tumor  vs.  normal  tissue  from  a  patient.  This  will 
allow  us  to  detect  either  insertion  of  a  new  Alu  element  and  assessment  of  the  rate  of 
gene  damage  from  retrotransposition,  as  well  as  detect  major  sequence  losses  that 
encompass  one  of  these  elements. 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 

Unclassified 


NSN  7540-01-280-5500 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 

Unclassified 


19.  SECURITY  CLASSIFICATION 
OF  ABSTRACT 

Unclassified 


15.  NUMBER  OF  PAGES 


16.  PRICE  CODE 


20.  LIMITATION  OF  ABSTRACT 

Unlimited 


Standard  Form  298  (Rev.  2-89) 

Prescribed  by  ANSI  Std.  Z39-18 
298-102 


(4)  Table  of  Contents 


Front  Cover 

p.  1 

Form  298 

p.  2 

Table  of  Contents 

p.3 

Introduction 

P-4 

Body 

p.  4-7 

Key  Research  Accomplishments 

P-8  _ 

Reportable  Outcomes 

D.  8 

Conclusions 

LP  9 

References 

P-9 

Appendices 

p.  9  and  5  appended  reprints 

3 


'v 


(5)  Introduction: 

This  project  was  based  on  the  hypothesis  that  early  cellular  transformation  events 
involved  in  breast  cancer  formation  might  influence  the  amplification  of  human  Alu  repeats. 

Any  increases  in  Alu  amplification,  might  contribute  to  further  destabilization  of  the  human 
genome  and  inactivation  of  tumor  suppressors  that  could  contribute  to  the  progression  of  breast 
cancer.  At  least  in  sporadic  cases,  Alu  insertions  have  been  shown  to  contribute  to  a  number  of 
cancers,  including  at  least  one  case  of  breast  cancer  due  to  inactivation  of  BRCA2  '.  We  have 
previously  shown  that  only  a  specific  set  of  subfamilies  of  Alu  elements  are  actively  amplifying 
in  the  human  genome  This  project  combines  this  information  with  an  anchored  PCR 
procedure  we  have  developed  to  form  displays  of  the  most  recently  amplified  Alu  elements.  We 
have  demonstrated  that  this  Allele-Specific  Alu  PCR  (ASAP)  will  effectively  display  the 
members  of  the  smallest  of  the  recent  Alu  subfamilies  as  bands  on  an  acrylamide  gel  (5).  Our 
goal  is  to  generalize  these  procedures  to  the  larger  subfamilies  and  explore  various  procedures  to 
deal  with  the  larger  number  of  bands  expected.  We  will  then  use  these  procedures  to  compare 
breast  cancer  and  normal  DNA  from  a  number  of  individuals  to  determine  whether  there  are 
new,  tumor-specific  Alu  inserts.  This  will  allow  us  to  determine  whether  this  form  of  genetic 
instability  plays  a  role  in  human  breast  cancer. 

Because  of  some  difficulties  with  initial  implementation  of  the  ASAP  assay,  we  also 
designed  approaches  to  use  an  LI  retrotransposition  reporter  gene  system  (Moran)  to  study  the 
specific  influences  on  retrotransposition  of  genetic  changes  associated  with  tumorigenesis,  as 
well  as  environmental  influences  that  may  contribute  to  breast  cancer.  This  will  allow.  Because 
it  is  thought  that  Alu  elements  utilize  the  same  retrotransposition  machinery  as  LI,  this  system 
should  allow  an  alternate  assessment  of  the  primary  question  of  whether  retroelement  insertions 
are  likely  to  contribute  to  breast  cancer  genomic  instability. 


(6)  BODY 

Original  Goals: 

First  Six  Months: 

•  Optimization  of  ASAP.  Our  primary  goal  will  be  to  optimize  the  Allele-Specific  PCR 
further.  We  will  work  to  identify  the  very  best  PCR  primers  to  allow  the  most  effective 
allele-specific  amplification  of  the  Alu  inserts  and  flanks.  This  will  allow  us  to  develop  a 
procedure  with  both  minimal  steps  and  minimal  background  in  the  later  experiments. 

•  No  patient  samples  will  be  needed  at  this  stage. 

First  Year: 

•  Optimization  of  Displays.  We  will  utilize  the  ASAP  procedure  to  generate  test  samples 
from  all  three  relevant  Alu  subfamilies,  which  can  then  be  utilized  to  improve  the  display 
procedures,  in  particular  the  subdivision  with  PCR  into  16  subdivisions.  We  will  begin  to 
explore  ways  to  utilize  subtraction  procedures  on  these  samples. 

•  No  patient  samples  will  be  needed  at  this  stage 

Second  Year: 
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•  Refinement  of  Subtraction  Technology.  Technical  development  will  continue  with 
refinement  of  the  subtraction  procedures  and  tests  of  the  sensitivity  of  detection  of  bands 
and  the  ability  to  pool  samples  in  the  PCR  reactions. 

•  Preliminary  work  on  tumor  samples.  Work  will  begin  with  existing  technology  to  carry 
out  analysis  on  tumor  samples.  We  expect  to  have  carried  out  analysis  of  the  first  10-20 
samples  in  this  year.  We  will  use  this  experience  to  determine  the  best  approach  to 
generate  data  in  a  production  mode.  This  will  provide  an  initial  feel  for  the  level  of 
diversity  in  the  displays  and  a  basic  characterization  of  any  diversity  to  determine 
whether  it  is  caused  by  insertions.  Any  evidence  of  other  forms  of  genomic  instability 
influencing  the  assay  will  be  assessed  at  this  point  and  procedures  optimized  to 
compensate. 

Third  Year: 

•  Completion  of  Tumor  Samples.  During  the  previous  year,  we  expect  to  have  optimized 
the  ASAP  procedures  and  their  display  completely.  This  will  allow  us  to  have 
determined  the  most  effective  approach  for  analysis  of  large  numbers  of  samples.  We 
will  utilize  this  year  solely  to  generate  data  on  as  many  tumors  as  possible.  We  will  focus 
our  efforts  initially  on  late  stage  tumors,  but  will  move  progressively  towards  earlier 
stage  tumors,  particularly  if  we  detect  extensive  Alu  amplification  at  late  stages. 

•  We  expect  to  complete  100  samples  by  the  end  of  the  third  year.  It  is  our  hope  that  the 
subtraction  of  pooled  samples  will  increase  the  data  flow  and  we  can  carry  out 
experiments  on  enough  samples  to  be  able  to  analyze  subgroups  based  on  tumor  stage, 
ethnic  origin  of  tumor  or  other  correlations  with  clinical  features  or  treatment. 

By  the  Second  year  it  became  clear  that  there  were  more  technical  difficulties  getting 
the  displays  fully  optimized  and  implementable  on  a  large  number  of  samples  aind  our  goals  had 
to  be  scaled  back  to  a  more  pilot  level.  In  addition,  last  year  we  reported  in  our  progress  report 
an  alternative  approach  to  address  the  critical  issue  of  whether  retrotransposition  played  a  critical 
role  in  breast  cancer  progression.  The  approach  was  to  use  a  reporter  system  for  LI 
retrotransposition  and  test  whether  genetic  alterations  associated  with  tumorigenesis  altered 
retrotransposition  rates. 


Accomplishments  of  the  three  year  period: 

(This  includes  a  summary  of  the  first  two  year’s  work,  although  without  the  detail 
placed  in  those  reports). 

During  the  first  two  years  we  explored  a  wide  range  of  approaches  for  optimizing 
displays  of  the  most  recently  inserted  Alu  inserts.  Year  1  focused  primarily  on  the  PCR-based 
display  itself,  utilizing  a  number  of  variations  to  both  increase  the  resolution  of  the  technique,  as 
well  as  ways  to  deal  with  the  large  numbers  of  elements  in  some  of  the  more  active  subfamilies 
which  gave  rise  to  too  many  elements  to  allow  our  assay  to  work.  We  were  successful  at 
generating  quality  displays  for  the  very  smallest  subfamilies  of  elements.  We  also  had  some 
success  utilize  various  less  frequent  restriction  digestions  to  allow  us  to  display  a  limited  subset 
of  the  more  abundant  subfamilies.  Our  biggest  difficult  at  this  point  was  to  figure  out  how  to 
display  the  2000  Ya5  subfamily  members  (which  are  responsible  for  the  majority  of  Alu  inserts 
causing  disease),  without  the  massive  number  of  bands  obscuring  the  variant  signals.  We  had 
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limited  success  with  the  use  of  PCR  primers  that  added  two  bases  to  the  end  of  the  primer  that 
went  into  the  genomic  flanking  sequence  to  allow  us  to  display  one  sixteenth  of  the  group  of 
bands  at  a  time.  Several  primers  gave  use  decent,  although  not  crisp  displays.  I  believe  that  our 
biggest  problem  with  this  approach  was  that  some  of  the  primers  could  sit  down  on  sites  in 
which  the  last  two  bases  base-paired  using  non  Watson-Crick  pairing  (i.e.  G-T  pairing),  resulting 
in  weaker  bands  that  created  background.  In  our  efforts,  although  several  primers  worked  pretty 
well,  others  worked  very  poorly.  A  number  of  variants  (include  perfect  match,  altering 
stringency,  etc)  did  not  improve  these  displays  ultimately.  Perhaps  our  biggest  disappointment 
was  that  several  attempts  to  utilize  subtraction  strategies  to  eliminate  the  common  bands  did  not 
work  at  all.  Our  only  observation  was  that  the  bands  all  got  lighter,  but  even  attempts  to  spike  a 
imique  band  in  the  mix  did  not  allow  us  to  enrich  the  unique  band.  These  studies  may  have  been 
influenced  by  the  presence  of  a  small  segment  of  common  repetitive  DNA  sequence  on  the  end 
of  each  fragment,  and  they  may  have  also  been  made  more  difficult  by  the  very  high  A+T 
content  of  the  sequences  adjacent  to  Alu  elements. 

As  more  human  genomic  sequence  was  made  available  in  GENBANK,  we  were  able  to 
identify  new  subfamilies  of  Alu  elements.  More  importantly,  we  found  that  some  of  the 
subfamilies  showed  very  high  levels  of  polymorphism  in  the  human  genome.  Using  a 
combination  of  bioinformatics  with  measurements  of  the  polymorphism  associated  with  these 
different  subfamilies,  we  were  able  to  determine  the  relative  age  and  copy  number  of  each  of 
their  subfamilies  and  provide  estimates  of  their  likelihood  of  current  activity.  Although  these 
data  did  provide  some  new,  smaller  subfamilies  that  we  could  adapt  to  our  display  technique,  by 
far  the  majority  of  Alu  elements  that  had  inserted  recently  to  cause  disease  still  remained  as  part 
of  the  larger  Ya5  and  Yb8  subfamilies.  Thus,  our  original  plan  of  displaying  the  majority  of 
potential  Alu  inserts  in  tumor  DNA  was  not  going  to  work  with  this  approach. 

As  we  approached  year  3,  we  also  began  to  tackle  some  of  the  issues  associated  with 
adapting  this  technique  to  a  number  of  tumor  tissues  to  allow  a  reasonable  sampling.  If  anything 
the  tumor  tissues  were  even  more  intractable,  partly  because  the  DNA  was  not  always  of  as  high 
a  quality  as  the  tissue  culture  DNA,  and  blood  DNAs,  that  we  were  using  in  the  pilot 
experiments.  Furthermore,  our  display  would  be  seriously  handicapped  by  any  heterogeneity  in 
the  tumor  tissue  that  might  weaken  the  signals,  while  not  lessening  the  background.  Therefore, 
although  we  worked  out  the  ability  to  display  distinct  subsets  of  the  recent  Alu  inserts,  we  were 
never  able  to  adapt  the  technique  to  be  able  to  display  a  significant  portion  of  these  inserts  in  a 
manner  which  convinced  us  that  we  would  be  able  to  see  any  significant  portion  of  new  inserts. 
Given  that  new  inserts  may  have  been  as  low  as  one  in  100  tumors,  we  began  to  explore 
alternative  approaches  for  addressing  the  potential  role  of  retrotransposition  in  breast  cancers. 

Although  the  ideal  was  to  look  at  authentic  tumor  tissues  and  look  for  authentic  Alu 
inserts,  we  would  obtain  a  pretty  good  picture  of  the  relative  impact  by  using  a  reporter  system 
introduced  into  tumor  cells  and  measuring  the  rate  of  retrotransposition  of  the  reporter  system  in 
normal  versus  transformed  cells.  The  development  of  an  LI  element  that  activated  a  neomycin 
selection  cassette  upon  retrotransposition,  provided  a  potential  method  to  quantify  LI 
retrotransposition  rates  in  tumors Furthermore,  as  most  of  us  believe  that  Alu  retrotransposes 
with  the  LI  machinery,  using  the  LI  system  should  provide  insight  into  both  LI  and  Alu  rates. 

Our  initial  experiments  using  p53  transformation  as  a  model  were  very  promising  and 
were  reported  in  the  last  report.  However,  as  we  have  learned  more  about  the  LI  assay,  we 
believe  that  those  preliminary  results  were  an  artifact  caused  by  the  stimulatory  influence  of  the 
mutant  p53  causing  the  cells  to  grow  faster.  To  some  extent  this  is  also  a  function  of  cell  plating 
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density  and  whether  the  G418  selection  for  neomycin  resistance  is  able  to  be  effective  before  the 
cells  approach  confluence.  Ultimately,  after  many  repetitions,  we  can  see  no  influence  of  p53 
mutation  on  the  LI  retrotransposition  rate.  However,  we  also  wanted  to  look  at  the  effect  of  cell 
cycle  in  general  and  we  have  been  able  to  demonstrate  that  slowing  cell  growth  by  a  factor  of 
two  by  lowering  the  growth  temperature  results  in  an  order  of  magnitude  decrease  in 
retrotransposition  rates.  Furthermore,  this  effect  correlates  with  growth  rate  and  not  just 
temperature.  If  the  temperature  is  lowered  just  at  the  beginning  of  the  assay,  the  rate  does  not 
change.  Thus,  the  LI  enzymes  are  not  susceptible  to  temperature,  instead,  lowering  the 
temperature  for  a  prolonged  period  has  a  secondary  effect  that  greatly  lowers  retrotransposition 
rates.  We  have  utilized  fluctuation  analysis  on  long-term  transformants  for  all  of  these  assays 
and  have  also  created  a  transient  transfection-based  assay.  At  this  point  we  are  gearing  up  to 
look  at  various  breast  cancer  cell  lines  for  their  retrotransposition  potential,  as  well  as  cells  with 
various  genetic  defects  associated  with  tumorigenesis  and  DNA  repair.  Thus,  although  we 
cannot  yet  answer  the  question  of  whether  transformation  alters  retrotransposition  and  therefore 
retrotransposition  may  contribute  to  the  progression  in  cancer,  we  now  have  the  tools  and  should 
be  able  to  test  a  number  of  model  systems  soon. 
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(7)  Key  Research  Accomplishments 
Year  1 

•  Establishment  of  optimum  conditions  for  amplification  of  the  most  recent  subfamilies  of 
Alu  inserts 

•  Obtaining  clear  displays  of  the  Ya8  subfamily  on  acrylamide  and  agarose  gels  which 
allow  the  isolation  of  insertion  polymorphisms  between  different  individuals. 

•  Demonstrating  the  use  of  modified  primers  that  display  subsets  of  the  Ya5  elements  that 
will  allow  at  least  a  substantial  portion  of  Ya5  inserts  to  be  studied. 

Year  2 

•  Identification  of  the  youngest,  most  active  Alu  subfamilies  that  can  be  amplified  and 
displayed  directly  without  the  use  of  subtraction  protocols. 

Year  3 

•  Development  of  a  complete  understanding  of  the  recent  amplification  of  Alu  elements  in 
the  human  genome  based  on  the  fusion  of  bioinformatics  on  the  complete  human  genome 
sequence  and  laboratory-based  studies. 

•  Development  of  approaches  to  use  retroposition  reporter  gene  systems  for  studies  of  the 
role  of  various  genes  and  environmental  influences  on  the  retrotransposition  frequency. 

(8)  Reportable  Outcomes 

Astrid  Roy-Engel  was  supported  by  this  grant. 

■  Deininger,  P.  and  Batzer,  M.  (1999)  Alu  repeats  and  human  disease.  Mol  Gen 

andMetab67, 183-193. 

■  Roy,  A.M.,  M.  Carroll,  D.H.  Kass,  Sun,  MA.  Batzer,  P.L.  Deininger  (1999) 

Recently  integrated  human  Alu  repeats:  Finding  needles  in  the  haystack. 

Genetica  107,  149-61. 

■  Roy,  A.M.,  M.L.  Carroll,  S.V.  Nguyen,  A.-H.  Salem,  M.  Oldridge,  A.O.M. 

Wilkie,  M.A.  Batzer,  and  P.  L.  Deininger  (2000)  Potential  gene  conversion 
and  source  gene(s)  for  recently  integrated  Alu  elements.  Genome  Research 
10,  1485-1495. 

■  Roy-Engel,  ML  Carroll,  E.  Vogel,  RK  Garber,  SV  Nguyen,  A-H  Salem,  MA 

Batzer  and  P.  Deininger  (2001)  Alu  insertion  polymorphisms  for  the  study  of 
human  genomic  diversity.  Genetics  (in  press) 

■  ML.  Carroll,  A.  Roy-Engel,  SV.  Nguyen,  A-H  Salem,  E.  Vogel,  B.Vincent,  J.  Myers, 
Z.  Ahmed,  L.  Nguyen,  M.  Sammarco,  WS.  Watkins,  J.  Henke,  W.  Makalowski,  LB. 
Jorde,  P.  Deininger,  and  MA.  Batzer.  (2001)  Large-scale  analysis  of  the  Alu  Ya5  and 
Yb8  subfamilies  and  their  contribution  to  human  genomic  diversity.  J.  Mol.  Biol,  (in 
press). 
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(9)  Conclusions 

We  were  able  to  develop  a  PCR  procedure  that  can  selectively  amplify  the  subset  of  most 
recently  inserted  Alu  elements.  Although  we  were  able  to  display  a  subset  of  these  elements,  we 
were  unable  to  overcome  sufficient  technical  difficulties  to  allow  an  assessment  of  the  number  of 
Alu  insertions  occurring  in  breast  tumors. 

We  developed  quantitative  approaches  to  measure  the  retrotransposition  capability  of 
different  cell  types  using  a  reporter-gene  approach.  Using  this  approach  we  showed  that 
dominant  negative  p53  mutations  did  not  alter  retrotransposition  rates,  but  that  major  changes  to 
cells  influencing  growth  rates  had  a  tremendous  influence.  We  are  currently  gearing  up  for  a  full 
assessment  of  breast  cancer  cell  lines,  and  a  number  of  genes  associated  with  tumorigenesis 
using  this  quantitative  assay. 
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All!  elements  comprise  >10%  of  the  human  genome.  We  have  used  a  computational  biology  approach  to 
analyze  the  human  genomic  DNA  sequence  databases  to  determine  the  impact  of  gene  conversion  on  the 
sequence  diversity  of  recently  integrated  Alu  elements  and  to  identify  Alu  elements  that  were  potentially 
retroposition  competent.  We  analyzed  269  Alu  YaS  elements  and  identified  23  members  of  a  new  Alu  subfamily 
termed  Ya5a2  with  an  estimated  copy  number  of  35  members,  including  the  de  novo  Alu  insertion  in  the  NFI 
gene.  Our  analysis  of  Alu  elements  containing  one  to  four  (Yal-Ya4)  of  the  YaS  subfamily-specific  mutations 
suggests  that  gene  conversion  contributed  as  much  as  10%-20%  of  the  variation  between  recently  integrated 
Alu  elements.  In  addition,  analysis  of  the  middle  A-rich  region  of  the  different  Alu  YaS  members  indicates  a 
tendency  toward  expansion  of  this  region  and  subsequent  generation  of  simple  sequence  repeats.  Mining  the 
databases  for  putative  retro  position-competent  elements  that  share  100%  nucleotide  identity  to  the  previously 
reported  de  novo  Alu  insertions  linked  to  human  diseases  resulted  in  the  retrieval  of  13  exact  matches  to  the  NFI 
Alu  repeat,  three  to  the  Alu  element  in  BRCA2,  and  one  to  the  Alu  element  in  FGFR2  [Apert  syndrome). 
Transient  transfections  of  the  potential  source  gene  for  the  Apert's  Alu  with  its  endogenous  flanking  genomic 
sequences  demonstrated  the  transcriptional  and  presumptive  transpositional  competency  of  the  element. 


Alu  elements  belong  to  a  class  of  retroposons  termed 
SINEs.  SINEs  are  Short  INterspersed  Elements  usually 
-100-300  bp  in  length  commonly  found  in  introns,  3' 
untranslated  regions  of  genes,  and  intergenic  genomic 
regions  (Deininger  and  Batzer  1993).  Alu  is  the  most 
abundant  class  of  SINEs  in  primate  genomes,  reaching 
a  copy  number  in  excess  of  one  million/haploid  ge¬ 
nome  (felinek  and  Schmid  1982;  Jurka  et  al.  1993,  Smit 
1999).  Alu  elements  increase  their  genomic  copy  num¬ 
ber  by  an  amplification  process  termed  retroposition 
(Rogers  and  Willison  1983;  Weiner  et  al.  1986). 

Alu  elements  appear  to  have  arisen  in  the  last  65 
million  years  (Deininger  and  Daniels  1986).  The  hu¬ 
man  Alu  family  of  repeats  is  composed  of  a  small  num¬ 
ber  of  distinct  subfamilies  characterized  by  subfamily- 
specific  diagnostic  mutations  (Slagel  et  al.  1987; 
Willard  et  al.  1987;  Shen  et  al.  1991;  Batzer  et  al. 
1996b).  The  source  Alu  gene(s)  for  each  of  the  subfami- 
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^These  authors  contributed  equally  to  this  work  as  senior  au¬ 
thors. 
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lies  has  been  retropositionally  active  during  different 
periods  of  primate  evolution.  The  rate  of  Alu  amplifi¬ 
cation  (mostly  Sx  subfamily)  appears  to  have  reached 
its  peak  between  60  and  35  million  years,  and  subse¬ 
quently  decreased  several  orders  of  magnitude  to  the 
present  amplification  rate  (Shen  et  al.  1991).  Only  a 
limited  number  of  SINEs,  termed  master  or  source 
genes,  appear  to  be  capable  of  retroposition  (Deininger 
and  Daniels  1986;  Batzer  et  al.  1990;  Deininger  et  al. 
1992),  although  the  critical  factor(s)  defining  func¬ 
tional  source  genes  are  not  understood.  A  variety  of 
factors  influence  the  retroposition  process  (Schmid 
and  Maraia  1992).  All  of  the  recently  integrated  young 
Alu  subfamilies  appear  to  be  retropositionally  active. 
Almost  all  of  the  recently  integrated  Alu  elements 
within  the  human  genome  belong  to  one  of  four 
closely  related  subfamilies  (Y,  YaS,  Ya8,  and  Yb8),  with 
the  majority  being  YaS  and  Yb8  subfamily  members 
(Batzer  et  al.  1990,  1995;  Deininger  and  Batzer,  1999). 

Previously,  analysis  of  individual  Alu  elements 
from  the  different  subfamilies  involved  laborious  pro¬ 
cedures,  such  as  cloning,  library  screening,  and  subse¬ 
quent  sequencing  (Batzer  et  al.  1990,  1995;  Arcot  et  al. 
1995a).  However,  the  availability  of  large-scale  human 
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genomic  DNA  sequences  as  a  result  of  the  Human  Ge¬ 
nome  Project  facilitates  genomic  database  mining  for 
Alu  elements  (Roy  et  al.  1999).  We  have  taken  advan¬ 
tage  of  these  databases  and  have  analyzed  a  significant 
portion  of  the  Alu  Ya5  subfamily,  as  well  as  interme¬ 
diates  between  the  Ya5  subfamily  and  the  ancestral  Alu 
Y  subfamily.  In  addition,  we  searched  the  databases  for 
putative  retroposition-competent  source  Alu  genes 
that  generated  the  de  novo  Alu  inserts  associated  with 
a  number  of  human  diseases  (Deininger  and  Batzer 
1999). 

RESULTS 

Computational  Analyses 

To  search  for  subfamilies  unidentified  previously 
within  the  Ya5  Alu  subfamily,  we  selected  all  of  the  Alu 
family  members  that  matched  our  Ya5  consensus 
query  sequence  from  the  human  genome  non- 
redundant  (nr)  database.  Only  Ya5  elements  found 
randomly  within  other  sequences  were  included  in  our 
analysis,  thereby  eliminating  Alu  elements  that  had 
been  identified  previously  in  directed  Alu-specific 
projects.  In  addition,  truncated  Alu  elements  were 


eliminated  from  the  analysis.  Ya4  elements  that  did 
not  contain  the  first  Ya5-specific  diagnostic  mutation 
#11  (Fig.  1)  (Shen  et  al.  1991),  which  is  a  CpG  dinucleo¬ 
tide  in  the  Ya5  subfamily,  were  considered  as  Ya5  Alu 
family  members.  We  obtained  a  total  of  269  matches  to 
the  Ya5  query  sequence  that  met  our  criteria.  Of  these, 
47  shared  100%  nucleotide  identity  with  the  subfamily 
consensus  sequence  and  83  were  near  perfect  matches 
(aside  from  a  few  CpG  mutations). 

Analysis  of  the  269  Ya5  Alu  elements  resulted  in 
the  initial  identification  of  two  subsets  of  potential 
subfamilies  containing  two  diagnostic  mutations  each, 
one  with  six  members  and  the  other  with  four.  These 
subfamiles  will  be  referred  to  as  Ya5a2  and  Ya5b2,  re¬ 
spectively,  in  compliance  with  the  standard  Alu  sub¬ 
family  nomenclature  (Batzer  et  al.  1996a).  Each  con¬ 
sensus  sequence  with  the  two  diagnostic  mutations 
specific  to  each  new  Alu  subfamily  is  shown  in  Figure 
1.  Interestingly,  the  de  novo  Alu  Ya5  insert  present 
within  an  intron  of  the  NFl  gene  (Wallace  et  al.  1991) 
is  an  exact  match  to  the  Ya5a2  consensus.  The  nr  da¬ 
tabase  contained  16.0%  of  human  DNA  sequences  for 
a  total  of  515,596,000  bases  on  the  date  of  the  search. 
The  estimated  size  of  the  Ya5a2  subfamily  is  (3  X  10^ 
bp/515,596,000  bp)  X  6  unique  Ya5a2 
matches  =  35  subfamily  members.  In  com¬ 
parison,  the  estimated  size  of  the  Ya5b2 
subfamily  is  (3  x  10^  bp/515,596,000  bp) 
X  4  unique  Ya5b2  matches  =  22  subfamily 
members.  We  utilized  only  the  randomly 
found  Ya5a2  elements  for  the  calcula¬ 
tions  to  avoid  overestimating  the  size  of 
the  subfamilies.  However,  these  numbers 
may  be  underestimations,  because  some 
specific  polymorphic  elements  of  these 
subfamilies  may  not  be  represented  in  the 
database. 

To  derive  a  second  estimate  of  the  copy 
numbers  of  the  Ya5a2  and  Ya5b2  Alu  sub¬ 
families,  we  used  their  consensus  se¬ 
quences  as  queries  for  the  high  throughput 
genome  sequence  (htgs)  and  genomic  sur¬ 
vey  sequence  (gss)  databases.  Seventeen  ad¬ 
ditional  Alu  Ya5a2  elements  were  found  in 
these  searches.  Of  the  23  total  Ya5a2  ele¬ 
ments,  13  shared  100%  nucleotide  identity 
with  the  subfamily  consensus  sequence.  No 
additional  Ya5b2  elements  were  found  in 
the  other  databases,  therefore  the  Ya5b2 
subfamily  was  not  subjected  to  further 
analysis.  Three  additional  potential  sub¬ 
families,  Ya5al  (five  members),  Ya5bl  (four 
members),  and  Ya5cl  (four  members)  with 
only  one  specific  diagnostic  mutation  were 
identified  (Fig.  1).  Because  of  the  small 
copy  number,  and  the  possibility  that  some 


YaS  GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGA  60 


Ya5a2  .  60 

Ya5b2  .  60 

YaSal  .  60 

YaSbl  .  60 

YaBcl  .  60 

11.  12  . 

YaS  TCACGAGGTCAGGAGATCGAGACCATCCCGGCTAAAACGGTGAAACCCCGTCTCTACTAA  120 

Ya5a2  . A .  120 

Ya5b2  . A.  ..C .  120 

YaSal  . G .  120 

YaSbl  .  120 

YaScl  .  120 

13  .  .  ,  14  . 

YaS  AAATACAAAAAA-TTAGCCGGGCGTAGTGGCGGGCGCCTGTAGTCCCAGCTACTTGGGAG  179 

YaSa2  . A .  180 

Ya5b2  . - .  l*^® 

YaSal  . - .  1'79 

YaSbl  . - . G .  179 

YaScl  . - .  179 

15  . 

YaS  GCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCG  239 

YaSa2  .  240 

YaSb2  .  239 

YaSal  .  239 

YaSbl  . 239 

YaScl  . G .  239 


YaS  CCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTC  281 

Ya5a2  .  282 

Ya5b2  .  281 

YaSal  .  281 

YaSbl  .  281 

YaScl  .  281 


Figure  1  Consensus  sequence  alignment  of  Ya5,  and  the  potential  new  subfam¬ 
ily  members  identified.  Nucleotide  substitutions  at  each  position  are  indicated  with 
the  appropriate  nucleotide.  Deletions  are  marked  by  dashes  (-).  The  Ya5  diagnostic 
nucleotides  are  Indicated  in  bold  with  the  corresponding  diagnostic  number  above 
as  defined  by  Shen  et  al.  (1991). 
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of  those  represent  parallel  mutations  rather  than  new 
subfamilies;  no  further  analyses  were  performed. 

To  determine  the  age  of  the  Ya5a2  subfamily;  we 
divided  the  nucleotide  substitutions  within  the  ele¬ 
ments  into  those  that  have  occurred  in  CpG  dinucleo¬ 
tides  and  those  that  have  occurred  in  non-CpG  posi¬ 
tions.  The  distinction  between  types  of  mutations  is 
made  because  the  CpG  dinucleotides  mutate  at  a  rate 
that  is  ~10  times  faster  than  non-CpG  (Labuda  and 
Striker  1989;  Batzer  et  al.  1990);  as  a  result  of  the 
deamination  of  5-methylcytosine  (Bird  1980).  A  total 
of  five  non-CpG  mutations  and  seven  CpG  mutations 
occurred  within  the  23  Alu  Ya5a2  subfamily  members 
identified.  By  use  of  a  neutral  rate  of  evolution  for  pri¬ 
mate-intervening  DNA  sequences  of  0.15%/one- 
million  years  (Miyamoto  et  al.  1987)  and  the  non-CpG 
mutation  rate  of  0.092%  (5/5382  bases  using  only  non- 
CpG  bases)  within  the  23  Ya5a2  Alu  elements;  yields 
an  estimated  average  age  of  0.62  million  years  for  the 
Ya5a2  subfamily  members  with  a  predicted  95%  con¬ 
fidence  level  in  the  range  of  0.28-1.08  million  yearS; 
given  that  the  mutations  were  random  and  fit  a  bino¬ 
mial  distribution.  The  Ya5a2  subfamily  appears  to  be 
much  younger  than  Ya5;  Ya8;  or  Yb8  Alu  subfamilies 
with  estimated  ages  of  2.8  million  years  (Batzer  et  al. 
1990);  2.75  million  years  (Roy  et  al.  1999);  and  2.7 
million  years  (Batzer  et  al.  1995);  respectively  (Fig.  2). 

Determination  of  the  number  of  elements  that 
perfectly  match  the  subfamily  consensus  sequence  can 
also  give  an  indirect  estimate  of  Alu  subfamily  age  and 
recent  rate  of  mobilization.  Recently  transposed  Alu 
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Ya5a2 
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Figure  2  Schematic  for  the  evolution  of  recently  integrated  Alu 
subfamilies.  The  origin  of  the  Ya5a2  Alu  subfamily  is  shown  after 
the  divergence  of  Ya5  and  Yb8  elements.  The  total  number  of 
elements  found  in  the  nr-database  (perfect  matches  in  parenthe¬ 
sis)  are  shown  first  separated  by  a  slash  from  the  total  number  of 
elements  found  in  all  three  databases  (nr,  gss,  htgs).  For  the  Ya5 
elements  only  the  nr-database  results  are  shown. 


Table  1.  Alu  Middle  A-RIch  Region 
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‘'Data  from  the  non-redundant  database  only. 
'^All  23  Ya5a2  members  are  included. 


elements  share  higher  levels  of  nucleotide  identity 
with  their  source  copies  because  they  have  not  resided 
in  the  genome  long  enough  to  accumulate  random 
mutations.  In  contrast;  older  Alu  elements  that  have 
resided  in  the  genome  for  longer  periods  of  time  tend 
to  have  less  nucleotide  identity  with  their  source  genes 
as  a  result  of  the  accumulation  of  random  mutations 
subsequent  to  integration  into  the  genome.  We  com¬ 
pared  our  search  results  for  the  Ya5a2  subfamily  with 
parallel  searches  from  the  Ya8  and  Ya5  Alu  subfamilies. 
Our  BLAST  searches  from  the  nr  database  yielded  one 
perfect  match  of  12  elements  for  Ya8;  47  of  269  for  Ya5; 
and  3  of  6  for  Ya5a2  (Fig.  2).  Searching  all  three  data¬ 
bases  (nr;  gsS;  and  htgs)  yielded  5  perfect  matches  of  27 
for  Ya8  and  13  of  23  for  Ya5a2.  These  results  are  in 
good  agreement  with  the  previous  estimates;  indicat¬ 
ing  that  Ya5a2  is  the  youngest  Alu  subfamily  reported 
to  date;  as  it  also  has  the  highest  proportion  of  ele¬ 
ments  that  share  100%  nucleotide  identity  with  the 
consensus  sequence. 

Stability  of  the  Middle  A-Rich  Region  in  Alu  YaS 
Members 

The  oligo-dA-rich  tails  and  middle  A-rich  regions  of 
Alu  elements  have  been  shown  previously  to  serve  as 
nuclei  for  the  genesis  of  simple  sequence  repeats  (Arcot 
et  al.  1995b).  In  the  autosomal  recessive  neurodegen- 
erative  disease;  Friedreich  ataxia;  the  most  common 
mutation;  is  the  hyperexpansion  of  a  GAA  within  the 
middle  A-rich  region  of  an  Sx  Alu  element  (Monter¬ 
mini  et  al.  1997).  Because  these  regions  appear  un¬ 
stable;  we  analyzed  the  middle  A-rich  region  of  Alu 
elements  retrieved  from  the  databases  to  detect  expan¬ 
sions/contractions  of  this  sequence. 

To  evaluate  potential  expansions/contractionS;  we 
performed  a  BLAST  query  of  three  databases  (nr;  htgs, 
and  gss)  using  the  Alu  Ya5  consensus  sequence  with 
varying  numbers  of  A  nucleotides  within  the  middle 
A-rich  region  (TA^TACA^TT).  Our  results  demonstrate 
that  the  majority  of  the  elements  identified  matched 
the  consensus  sequence.  However,  there  is  a  trend  for 
an  A  expansion  at  both  positions  (Table  1).  In  contrast. 
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very  few  sequence  contractions  were  detected  for  any 
of  the  positions. 

Human  Genomic  Variation 

To  determine  the  human  genomic  variation  associated 
with  the  Ya5a2  Alu  subfamily  members,  we  selected 
the  13  Ya5a2  elements  identical  to  the  subfamily  con¬ 
sensus  sequence  as  well  as  2  others  and  determined  the 
degree  of  fixation  associated  with  the  elements  using 
PCR-based  assays  of  a  panel  of  diverse  human  DNA 
samples  with  the  primers  shown  in  Table  2.  The  panel 
is  composed  of  20  individuals  of  European  origin,  Af¬ 
rican-Americans,  Greenland  natives,  and  Egyptians 
for  a  total  of  80  individuals  (160  chromosomes).  The 
Alu  elements  were  classified  as  fixed  absent,  fixed 
present,  and  high,  intermediate,  or  low  frequency 
insertion  polymorphisms  (see  Table  3  for  definitions). 
By  use  of  this  approach,  3  of  the  14  elements  tested 
(Ya5NBC206,  Ya5NBC207,  and  Ya5NBC235)  were  al¬ 
ways  present  in  the  human  genomes  that  were  sur¬ 
veyed,  suggesting  that  these  elements  became  fixed  in 
the  genome  prior  to  the  radiation  of  modern  humans 
from  Africa.  Five  of  the  elements  (Ya5NBC208, 
Ya5NBC240,  Ya5NBC241,  Ya5NBC242,  and 
Ya5NBC220)  are  intermediate  frequency  Alu  insertion 
polymorphisms.  The  remaining  six  elements  are  low- 
frequency  Alu  insertion  polymorphisms  (Table  3).  The 
population-specific  genotypes  and  levels  of  heterozy¬ 
gosity  for  each  element  are  shown  in  Table  4.  The  high 
proportion  of  polymorphic  elements  is  in  good  agree¬ 
ment  with  our  other  observations,  indicating  that 


the  Ya5a2  subfamily  is  younger  than  any  of  the  other 
Alu  subfamilies  identified  previously  in  the  human  ge¬ 
nome. 

Gene  Conversion  and  Alu  Sequence  Diversity 
In  our  query  of  the  human  genome  (nr)  database,  91  of 
the  Alu  elements  identified  contain  one  to  four  of  the 
five  Ya5  diagnostic  nucleotides  (Fig.  1).  Of  these  91 
intermediate  elements,  4  are  Yal,  1  Ya2,  7  Ya3,  and  79 
Ya4  Alu  elements  (Fig.  3).  Surprisingly,  not  all  of  the 
Alu  elements  with  different  numbers  of  subfamily  mu¬ 
tations  had  the  same  combination  of  mutations.  To 
facilitate  identification  of  the  individual  elements  with 
different  diagnostic  mutation  combinations,  the  diag¬ 
nostic  nucleotides  were  numbered  consecutively  in  or¬ 
der  of  abundance  (Ya3.1,  Ya3.2,  etc.,  see  Fig.  3).  Seven¬ 
teen  Alu  elements  (Ya4.4)  did  not  contain  the  first  di¬ 
agnostic  mutation  (#11),  but  were  still  classified  as  Ya5 
for  the  analyses  outlined  above. 

Previous  evolutionary  analyses  of  the  Ya5  founder 
element  with  different  primate  DNA  samples  demon¬ 
strated  the  sequential  accumulation  of  the  YaS  diag¬ 
nostic  mutations  with  diagnostic  positions  #13/#14 
first,  followed  by  #12/#16,  and  finally  position  #11 
(Shaikh  and  Deininger  1996).  Our  data  are  not  consis¬ 
tent  with  a  sequential  order  in  the  accumulation  of  the 
diagnostic  mutations.  The  elements  classified  as  Yal, 
Ya2,  Ya3.4,  Ya3.5,  and  Ya4.4  (26  total)  fit  the  proposed 
order  (Fig.  3).  However,  the  remaining  65  elements  rep¬ 
resent  almost  every  other  permutated  order.  Several 
mechanisms  could  explain  the  occurrence  for  mosaic 


Table  2.  Alu  Ya5a2  PCR  Primers,  Chromosomal  Locations,  and  PCR  Product  Sizes 

Product  size"" 


Chromo 


Name 

5'  Primer  sequence  (5 -3‘) 

3'  Primer  sequence  (5'A-3') 

A.T.^ 

some*" 

filled 

empty 

Ya5NBC206 

TCCTTACCTATCrCACAACCTACAT 

ACACATTTCCTTCAAGAGGTCAAAG 

60°C 

4 

734 

424 

Ya5NBC207 

CAGTTTTATACACTGGCCTCTTTTC 

TTGTAGGAGAAAGAGGGGAAATACT 

50°C 

6 

443 

122 

Ya5NBC208 

AATACCTTGTACATCTTCACCCCTA 

TCTCTCTGCTGCACAGTTTGTT 

50"C 

14 

441 

115 

Ya5NBC240 

CAGGAGATAAATATGTTCGGAGAGT 

TAACTGGGACAGTGAGTTTTACCTG 

55X 

9 

505 

202 

Ya5NBC241 

GGTTCCAATAGAGAGCAACAGAA 

ACCTTAAGCTTTCCCCCAGA 

55°C 

15 

392 

66 

Ya5NBC242 

AACAAAATTCCCTTTCCTCCA 

GGCAATCTGACCTTGGGTAA 

55°C 

7 

503 

192 

Ya5NBC7 

TGATGGATATTTGGGTTGGTTC 

GGACTGTAAACTAGTTCAACCATTGTG 

60X 

7 

522 

216 

Ya5NBC205 

ACATGAAGGGCCGACTGTAT 

TGCTGCTGCATTATCAACTG 

50°C 

21 

435 

81 

Ya5NBC209 

GTCTATGGGAAGATGAAGAATACGA 

GATGGAGTCACTCATGTGAAAAGTA 

55°C 

14 

447 

116 

Ya5NBC239 

CAGCTGAGAACTGTCACAAATAGAA 

ATCAATGACTGACTTGTGCTGAGT 

55°C 

9 

531 

198 

Ya5NBC243 

CCATGATTCGTCATTCACCA 

AGGAGACCTGCCAATGAATG 

60°C 

21 

406 

86 

Ya5NBC220 

AAATCAAGCTGCCATACCTCA 

CAAACCATCCTTCACAGTGG 

60X 

1 

463 

141 

Ya5NBC235 

CCCAAGGCACTTGCTGTTA 

CCCTTCGAGAAAGAGGAAGG 

SOX 

2 

391 

76 

Ya5NBC244 

CCTATGGCTGAAACTTCTGAAACT 

ATATCTTGGTCCACTAGACAAGCAC 

60X 

18 

453 

130 

Ya5NBC237^ 

CCCATGGAGGGTCTTTCCTA 

CTGGAAACCATCCTTCACAGT 

60X 

1 

410 

88 

^Amplification  of  each  locus  required  2.5  min  at  94°C  initial  denaturing,  and  32  cycles  for  1  min  94®C,  1-min  annealing  temperature 
(A.T.)  and  1-min  elongation  at  72°C.  A  final  extension  time  of  10  min  at  72°C  was  also  used. 

‘"Chromosomal  location  determined  from  accession  information  or  by  PCR  analysis  of  NIGMS  monochromosomal  hybrid  cell  line  DNA 
samples. 

‘'Empty  product  sizes  calculated  by  removing  the  Alu  element  and  one  direct  repeat  from  the  filled  sites  that  were  identified. 

‘^Alu  Ya5a2  element  of  the  FCFR2  gene. 
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Table  3.  AIu  Ya5a2  (Nf  7)-Associated  Human  Genomic  Diversity 

Ya5a2  elements 

Accession  no.  (duplicates) 

Position 

Allele  frequency® 

Ya5NBC206 

AC004057 

76767-77048 

fixed  present 

Ya5NBC207 

ALII 8555  (All 32992) 

9981-9700  (40728-41009) 

fixed  present 

Ya5NBC208 

All  0991 9 

701 70-69889 

intermediate 

Ya5NBC220 

AC00761 1 

136715-1 36434 

intermediate 

Ya5NBC240 

AC133410(AL1 35841) 

34800-35081  (49829-49548) 

intermediate 

Ya5NBC241 

AC01 8924 

144017-144298 

intermediate 

Ya5NBC242 

AC009517 

161301-161582 

intermediate 

Ya5NBC7 

AC004848 

24522-24241 

low 

Ya5NBC205 

AL01 1 328 

204488-204207 

low 

Ya5NBC209 

AC00808 

147056-146775 

low 

Ya5NBC239 

All  33284 

115867-115586 

low 

Ya5NBC244 

AC026839 

64885-64604 

low 

Ya5NBC243 

A)011929 

151192-151473 

low 

Ya5NBC235‘" 

AQ748733 

458-739 

fixed  present 

Ya5NBC237^ 

AL031274 

331 75-33501 

intermediate 

^Allele  frequency  was  classified  as  fixed  present,  fixed  absent,  low,  intermediate,  or  high  frequency  insertion  polymorphism.  (Fixed 
present)  every  individual  tested  had  the  AIu  element  in  both  chromosomes;  (low  frequency  insertion  polymorphism)  the  absence  of 
the  element  from  all  individuals  tested,  except  for  one  or  two  homozygous  or  heterozygous  individuals;  (intermediate  frequency 
insertion  polymorphism)  the  AIu  element  is  variable  as  to  its  presence  or  absence  in  at  least  one  population;  (high  frequency  insertion 
polymorphism)  the  element  is  present  in  all  individuals  in  the  populations  tested,  except  for  one  or  two  heterozygous  or  absent 


Individuals. 

‘^Several  Ns. 

‘'Ya5NBC237  is  the  exact  match  to  the  FGFR2  AIu  insertion. 


AIu  elements,  which  are  addressed  in  the  discussion 
section.  However,  we  believe  the  most  likely  explana¬ 
tion  for  the  existence  of  these  mosaic  elements  is 
through  gene  conversion  events.  A  limited  amount  of 
gene  conversion  between  Yb8  AIu  elements  has  been 


Table  4.  AIu  Ya5a2-Associated  Human  Genomic  Diversity 

African  American  Greenland  natives 


reported  previously  (Batzer  et  al.  1995;  Kass  et  al. 
1995).  In  theory,  gene  conversion  may  change  the  se¬ 
quence  of  all  or  part  of  any  AIu  element  in  either  an 
evolutionarily  forward  (Ya5  subfamily  in  this  case)  or 
backward  (Y  subfamily)  direction  by  changing  the  di- 


European  Egyptian 


Elements  genotype®  fAlu'"  genotypes  fAlu  genotypes  /AIu  genotypes  fAlu  het.^ 


Ya5NBC206 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

0.000 

Ya5NBC207 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

0.000 

Ya5NBC208 

4 

1 

7 

0.375 

3 

0 

4 

0.429 

13 

0 

6 

0.684 

7 

0 

5 

0.583 

0.482 

Ya5NBC236 

5 

6 

2 

0.615 

5 

8 

6 

0.474 

15 

5 

0 

0.875 

6 

8 

1 

0.667 

0.422 

Ya5NBC240 

5 

1 

9 

0.367 

11 

0 

4 

0.733 

5 

1 

10 

0.344 

5 

3 

3 

0.591 

0.464 

Ya5NBC241 

3 

9 

5 

0.441 

6 

11 

2 

0.605 

0 

7 

11 

0.194 

3 

8 

4 

0.467 

0.459 

Ya5NBC242 

2 

13 

1 

0.531 

7 

4 

3 

0.643 

3 

4 

11 

0.278 

3 

3 

1 

0.643 

0.474 

Ya5NBC7 

0 

0 

19 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0.000 

Ya5NBC205 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0.000 

Ya5NBC209 

0 

1 

17 

0.028 

0 

0 

17 

0.000 

0 

0 

19 

0.000 

0 

0 

19 

0.000 

0.000 

Ya5NBC239 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0.000 

Ya5NBC243 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0 

0 

20 

0.000 

0.000 

Ya5NBC220 

0 

14 

5 

0.368 

1 

15 

2 

0.472 

0 

18 

1 

0.474 

0 

9 

2 

0.409 

0.502 

Ya5NBC244 

0 

0 

12 

1.000 

— 

— 

— 

— 

0 

0 

10 

0.000 

0 

0 

8 

0.000 

0.000 

Ya5NBC235 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

20 

0 

0 

1.000 

0.000 

Ya5NBC237^ 

18 

1 

0 

0.974 

15 

4 

0 

0.895 

20 

0 

0 

1.000 

18 

1 

0 

0.974 

0.075 

^Genotypes:  +/+  AIu,  +/-  AIu,  -/-  AIu. 

‘^Frequency  of  the  presence  of  the  AIu. 

‘^Average  heterozygosity. 

^Ya5NBC237  is  the  exact  match  to  the  FGFR2  AIu  insertion. 
—  not  determined. 
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Figure  3  Evolution  of  the  diagnostic  nucleotide  positions  from 
Y  to  Ya5  Alu  elements.  Alignment  of  the  five  Alu  Ya5  diagnostic 
nucleotides  as  defined  by  Shen  et  al.  (1991)  and  the  different 
Yab  Ya2,  Ya3,  and  Ya4  elements  found  in  the  nr  database.  For 
easy  reference,  individual  elements  containing  different  combi¬ 
nations  of  the  diagnostic  mutations  were  numbered  consecu¬ 
tively  in  order  of  abundance  (Ya3.1,  Ya3.2,  etc.).  Ya4.4  elements 
were  considered  as  Ya5  elements  in  the  first  Ya5  subfamily  analy¬ 
sis  in  this  paper.  The  total  number  of  elements  found  for  each 
subgroup  is  indicated  at  left  in  parenthesis.  Potential  forward  (f) 
or  backward  (b)  gene  conversions  are  indicated  at  right  The 
previously  reported  order  of  appearance  of  Ya5  diagnostic  mu¬ 
tations  (Shaikh  and  Deininger  1 996)  is  indicated  below.  Elements 
with  diagnostic  mutations  that  follow  the  stepwise  hierarchical 
accumulation  are  circled. 


agnostic  mutations.  In  addition,  double  gene  conver¬ 
sions  would  be  extremely  rare,  making  the  direction  of 
the  gene  conversion  clear  in  some  elements.  We  clas¬ 
sified  the  91  mosaic  Alu  element  sequences  as  gene 
converted  forward  (f),  backward  (b),  or  could  not  be 
determined  (-),  (see  Fig.  3)  If  the  Alu  elements  that  fit 
the  proposed  sequential  evolution  are  ignored  in  the 
analysis,  all  of  the  other  elements  may  be  classified  as 
backward  gene  conversion  (32  total)  or  could  not  be 
determined  (33  total),  and  none  were  clearly  gene- 
converted  forward.  Therefore,  backward  gene  conver¬ 
sion  may  have  contributed  to  between  10%  and  20% 
(32  to  65/269  Ya5  +  [91-17]  Yal-Ya4)  of  the  Alu  Ya5 
sequence  diversity.  Interestingly,  evaluation  of  the  five 
random  Ya5a2  non-CpG  mutations  shows  that  one 
mutation  in  position  #13  is  a  backward  mutation  to 
the  Y  subfamily,  another  putative  example  of  a  reverse 
gene  conversion. 

In  Search  of  Retroposition-Competent  Alu  Repeats 
Sixteen  different  Alu  insertions  have  been  linked  to 
human  diseases  (Deininger  and  Batzer  1999).  Four  be¬ 
long  to  the  Alu  Y  subfamily,  one  to  the  Ya4  subfamily, 
eight  to  the  Ya5  subfamily,  and  three  to  the  Yb8  sub¬ 
family.  Closer  inspection  of  the  nucleotide  sequences 
of  these  Alu  elements  show  that  they  have  some  mu¬ 
tations  that  are  different  from  their  respective  subfam¬ 
ily  consensus  sequences.  Because  these  Alu  insertions 


are  very  recent  in  origin,  they  are  likely  to  be  identical 
to  their  source  genes  aside  from  rare  mutations  intro¬ 
duced  during  reverse  transcription  of  the  Alu  element. 
Therefore,  sequence  database  queries  utilizing  each  Alu 
element  along  with  its  individual  mutations  (away 
from  the  subfamily  consensus  sequence)  may  facilitate 
the  identification  of  the  source  Alu  element  that  gen¬ 
erated  the  copy.  This  strategy  is  similar  to  that  used 
previously  in  the  identification  of  active  LINE  elements 
from  the  human  genome  (Dombroski  et  al.  1993). 

A  database  query  using  the  sequence  of  the  indi¬ 
vidual  Alu  elements  responsible  for  each  disease  to 
mine  three  databases  (nr,  htgs,  and  gss)  identified  exact 
complements  to  four  of  the  disease-associated  Alu  re¬ 
peats.  Thirteen  of  the  identified  elements  were  exact 
matches  to  the  NFl  Alu  insertion  (Ya5a2  subfamily. 
Table  3;  Wallace  et  al.  1991);  three  were  exact  matches 
to  the  BRCA2  Alu  element  (Miki  et  al,  1996)  (accession 
nos.  AL121964,  AL136319,  and  AL135778);  one 
matched  the  FGFR2  Alu  repeat  (Oldridge  et  al.  1999) 
(accession  no.  AL031274);  and  one  matched  the  Alu 
repeat  in  the  IL2RG  gene  (Lester  et  al.  1997)  (accession 
no.  AC010888). 

Potential  Source  Gene  for  the  YaS  Insert  in  FGFR2 
As  mentioned  above,  our  BLAST  query  only  detected 
one  exact  match  (accession  no.  AL031274  or 
Ya5NBC237)  to  the  YaS  Alu  found  in  the  FGFR2  gene 
that  caused  Apert  syndrome.  We  estimated  the  level  of 
human  genomic  variation  associated  with  Ya5NBC237 
using  the  same  human  DNA  panel  and  determined 
that  it  was  an  intermediate  frequency  Alu  insertion 
polymorphism  (Table  4). 

Mobilization-competent  Alu  elements  must  be  ca¬ 
pable  of  transcription,  the  first  step  in  the  retroposition 
process.  To  evaluate  Alu  Ya5NBC237  as  a  potential 
source  gene  for  the  de  novo  insert  in  the  patient  with 
Apert  syndrome,  we  determined  its  transcription  capa¬ 
bility.  Constructs  with  the  genetic  loci  containing  the 
Ya5NBC237  Alu  and  the  de  novo  Apert  syndrome  Alu 
element  were  made.  Transcription  levels  from  the  two 
constructs  were  evaluated  by  Northern  blot  analysis 
relative  to  a  control  plasmid  in  which  the  Alu  element 
is  flanked  immediately  upstream  by  vector  sequence. 

Transient  transfections  (Fig.  4)  of  the  constructs 
into  rodent  cell  line  C6  (rat  glial  tumor)  were  per¬ 
formed.  Although  the  Alu  element  in  the  control  plas¬ 
mid  has  an  intact  internal  Pol  III  promoter,  Alu  tran¬ 
scripts  are  barely  detectable  from  the  control  plasmid. 
In  contrast,  the  transcription  from  the  Apert's  Alu  ele¬ 
ment  and  its  potential  source  gene  were  elevated  three- 
to  fourfold,  as  expected  for  putative  mobilization- 
competent  Alu  repeats.  This  result  suggests  that  the 
genomic  flanking  sequence  of  Ya5NBC237  probably 
makes  the  Alu  transcription  competent,  one  of  the  sev¬ 
eral  requirements  of  a  source  gene.  The  same  results 
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Figure  4  Evaluation  of  transcriptional  capability  of  the  poten¬ 
tial  FGFR2  source  Ya5  Alu  element.  The  transcriptional  efficiency 
of  the  de  novo  FCFR2  Alu  repeat  and  its  putative  source  gene 
were  evaluated  by  Northern  blot  analysis  from  transient  transfec¬ 
tion  studies.  The  following  constructs  were  evaluated:  (lane  7) 
p-^^^Ap,  (lane  2)  p-^^^Ya5NBC237,  and  (lane  3)  p'^^Ya5NBC237. 
Lanes  4  and  5  are  internal  control  only,  and  no  DNA  controls, 
respectively.  Small  arrows  indicate  the  Alu  transcripts  and  the 
open  arrow  indicates  the  internal  control  transcript.  The  ratio  of 
the  Alu  transcript/control  transcript  (numbers  below)  was  nor¬ 
malized  to  the  p'^'’Ya5NBC237  transcription  ratio,  which  was  as¬ 
signed  the  arbitrary  value  of  1 . 

were  obtained  from  transfections  in  the  human  embry¬ 
onic  kidney  cell  line  293  (data  not  shown). 

DISCUSSION 

Our  computational  and  experimental  analyses  of  the 
Ya5  subfamily  of  Alu  repeats  provides  an  overall  pic¬ 
ture  of  the  most  active  of  the  recently  integrated  young 
Alu  subfamilies  from  the  human  genome.  The  analysis 
of  Alu  Ya5  repeats  allowed  us  to  address  a  number  of 
questions  about  the  biology  of  these  elements,  such  as 
the  potential  impact  of  gene  conversion  events,  and 
the  identification  of  Alu  family  members  from  the  hu¬ 
man  genome  that  may  be  capable  of  retroposition. 

Alu  elements  spread  throughout  the  genome  by 
retroposition  in  the  last  65  million  years.  The  master/ 
source  gene  model  (Batzer  et  al.  1990;  Shen  et  al.  1991; 
Deininger  et  al.  1992)  posits  that  a  very  small  subset  of 
the  >1,000,000  Alu  elements  within  the  human  ge¬ 
nome  are  capable  of  high  levels  of  retroposition;  al¬ 
though  a  much  larger  number  may  make  a  few  copies. 
The  formation  of  Alu  subfamilies  may  be  explained  by 
the  sequential  accumulation  of  mutations  within  the 
active  source  gene(s)  followed  by  proliferation  of  the 
mutated  source  elements.  A  number  of  studies  indicate 
that  relatively  few  source  Alu  genes  have  played  a 
dominant  role  in  the  amplification  and  evolution  of 
Alu  elements  (Shen  et  al.  1991;  Deininger  et  al.  1992; 
Deininger  and  Batzer  1993;  Kapitonov  and  Jurka 
1996).  Although  retroposition  is  the  primary  mode  of 
SINE  mobilization  and  sequence  evolution  through 


mutations  in  the  source  gene(s),  our  analysis  suggests 
that  gene  conversion  and  genetic  instability  of  Alu- 
based  simple  sequence  repeats  have  also  had  a  signifi¬ 
cant  impact  on  the  sequence  architecture  of  this  major 
family  of  human  genomic  sequences. 

There  are  several  alternatives  that  could  explain 
the  occurrence  of  mosaic  Alu  elements.  First,  some  of 
the  mosaic  Alu  elements  with  a  single  mutation  could 
be  explained  by  the  occurrence  of  parallel  mutations. 
However,  this  seems  unlikely  unless  there  were  selec¬ 
tion  for  these  specific  mutations,  possibly  through  a 
post-transcriptional  selection  process  (Sinnett  et  al. 
1992).  It  is  also  difficult  to  envision  a  selection  process 
that  would  only  select  for  mutations  at  adjacent  diag¬ 
nostic  positions,  such  as  we  see  here.  Also,  recombina¬ 
tion  between  different  Alu  elements  could  have  gener¬ 
ated  some  of  these  intermediate  Alu  elements  that  con¬ 
tain  a  mosaic  of  diagnostic  mutations.  However,  in 
many  cases,  multiple  recombination  events  would  be 
required  to  obtain  this  outcome,  making  it  highly  un¬ 
likely.  Although  there  are  alternative  mechanisms,  we 
believe  gene  conversion  is  the  most  likely  explanation 
for  the  occurrence  of  mosaic  Alu  elements. 

The  mechanisms  of  genome-wide  gene  conversion 
between  mobile  elements  are  not  well  understood  in 
humans  (see  Kass  et  al.  1995,  and  references  therein). 
Our  data  show  that  even  the  very  short,  dispersed  Alu 
elements  appear  to  be  capable  of  high  levels  of  gene 
conversion,  which  usually  involve  only  short  sequence 
stretches.  In  addition,  our  data  show  that  reverse  or 
backward  gene  conversions  may  be  more  favored.  It 
seems  likely  that  higher  levels  of  the  Y  element  copy 
number  (Shen  et  al.  1991)  or  transcription  (Shaikh  et 
al.  1997)  may  play  a  role  in  determining  the  direction¬ 
ality  of  the  gene  conversion  events.  Although  older  Alu 
subfamilies,  such  as  J  and  Sx  are  present  in  higher  copy 
numbers  in  the  genome,  they  diverged  greatly  from 
their  consensus  sequences  due  to  mutations  that  have 
accumulated  throughout  evolution.  Gene  conversion 
would  not  be  favored  between  such  divergent  se¬ 
quences.  However,  Alu  Y  elements  tend  to  be  more 
conserved  (better  matches  to  Ya5)  and  with  high  copy 
number  (Batzer  et  al.  1995).  Therefore,  both  abun¬ 
dance  (genomic  copy  number  and/or  transcript  levels) 
and  sequence  identity  appear  to  be  influential  in  the 
Alu  gene  conversion  events  observed. 

There  are  multiple  examples  of  gene  conversion 
events  in  literature.  Genetic  exchange  between  exog¬ 
enous  and  different  endogenous  mouse  LI  elements 
has  been  demonstrated  previously  to  readily  occur 
(Belmaaza  et  al.  1990).  Kass  et  al.  (1995)  reported  pre¬ 
viously  a  gene  conversion  event  in  which  one  of  the 
oldest  Alu  family  members  was  converted  to  one  of  the 
youngest  Alu  subfamilies,  Yb8.  In  addition,  a  partially 
converted  Yb8  Alu  element  was  also  reported  previ¬ 
ously  by  Batzer  et  al.  (1995).  In  yeast,  some  types  of 
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mobile  elements  spread  through  the  genome  by  gene 
converting  pre-existing  elements  (Hoff  et  al.  1998). 
When  we  combine  this  type  of  mobilization  in  the 
yeast  genome  with  the  Alu  gene  conversions  reported 
previously,  as  well  as  those  in  this  paper,  one  could 
argue  that  gene  conversion  may  represent  a  second 
type  of  amplification  mechanism  for  short  interspersed 
elements  in  the  human  genome.  These  observations 
suggest  that  evolutionary  studies  of  all  types  of  inter¬ 
spersed  elements  that  ignore  gene  conversion  events 
may  lead  to  biased  conclusions. 

Variations  in  the  length  of  the  middle  A-rich  re¬ 
gion  and  oligo-dA-rich  tails  of  Alu  elements  are  not 
uncommon  (Economou  et  al.  1990;  Arcot  et  al.  1995b; 
Jurka  and  Pethiyagoda  1995).  Microsatellite  repeats 
have  been  found  to  be  associated  with  the  3'  oligo  (dA) 
tails  and  the  middle  A-rich  region  of  Alu  elements.  In 
the  case  of  Friedreich  ataxia,  the  most  common  muta¬ 
tion  is  the  hyperexpansion  of  a  GAA  trinucleotide  re¬ 
peat  within  the  middle  A-rich  region  of  an  Sx  Alu 
(Montermini  et  al.  1997).  However,  microsatellites  in 
the  middle  of  Alu  elements  are  not  as  common  because 
of  the  much  shorter  initial  length  of  the  middle  A-rich 
region.  Arcot  et  al.  (1995b)  reported  previously  that 
only  about  one-fourth  of  the  Alu  elements  containing 
(AC),,  repeats  had  them  as  a  part  of  their  middle  A-rich 
region.  The  one  specific  example  they  studied  in  detail 
had  an  evolutionary  expansion  of  the  A-rich  region 
(orangutan  and  gibbon)  before  the  genesis  of  the  AC 
repeat;  suggesting  the  requirement  for  an  initial  expan¬ 
sion.  Interestingly,  our  large-scale  analysis  of  the 
middle  A-rich  regions  of  Ya5  elements  demonstrates  a 
trend  toward  expansion  of  the  A  region,  providing  ad¬ 
ditional  support  for  this  region  of  the  Alu  elements  to 
act  as  a  potential  nucleus  for  the  genesis  of  simple  se¬ 
quence  repeats. 

From  our  subset  of  269  AluYa5  elements,  we  were 
able  to  identify  a  new  Alu  subfamily  termed  Ya5a2.  The 
estimated  average  age  of  0.62  million  years  (0.28-1.08 
million  years  with  95%  confidence)  makes  Ya5a2  the 
youngest  subfamily  of  Alu  repeats  identified  in  the  hu¬ 
man  genome  to  date.  It  is  as  abundant  as  the  Ya8  sub¬ 
family  (Roy  et  al.  1999)  and  its  higher  level  of  insertion 
polymorphism  suggests  a  higher  level  of  current  retro- 
position.  The  Ya5a2  subfamily  may  have  originated 
from  a  Ya5  Alu  element  that  inserted  in  a  genomic 
region  that  favored  transcription  and  corresponding 
retroposition  activity  of  the  element,  thereby  generat¬ 
ing  a  source  gene.  The  subsequent  accumulation  of  the 
two  specific  mutations  facilitated  the  differentiation  of 
the  copies  made  by  the  Ya5a2  source  gene  from  the 
larger  background  of  several  hundred  genomic  Ya5  Alu 
family  members.  As  new  Alu  elements  integrate  into 
the  genome  in  favorable  genomic  locations,  they  can 
occasionally  remain  retropositionally  competent  and 
generate  copies  of  themselves.  However,  the  frequency 


of  fortuitous  insertions  of  new  Alu  elements  into  fa¬ 
vorable  genomic  locations  for  subsequent  mobilization 
is  still  a  rare  event  because  the  continuity  of  the  hier¬ 
archical  subfamily  sequence  structure  of  the  Alu  ele¬ 
ments  is  largely  conserved  throughout  primate  evolu¬ 
tion. 

Alu  elements  that  are  polymorphic  for  insertion 
presence/absence  have  been  proven  previously  to  be 
useful  for  the  study  of  human  population  genetics  and 
forensics  (Batzer  et  al.  1991;  Jorde  et  al.  2000;  Perna  et 
al.  1992;  Batzer  et  al.  1994;  Tishkoff  et  al.  1996;  Stonek- 
ing  et  al.  1997).  The  identification  of  a  very  young  Alu 
subfamily  with  a  high  proportion  of  polymorphic 
members  provides  a  new  source  of  Alu  insertion  poly¬ 
morphisms  for  the  study  of  human  population  genet¬ 
ics.  However,  it  is  important  to  note  that  theYa5a2 
subfamily  is  extremely  small  (~35  copies  total  in  a 
background  of  >1,000,000)  comparable  with  Ya8,  so 
that  an  exhaustive  analysis  of  a  single  human  genome 
would  only  generate  ~20  polymorphic  Ya5a2  elements. 

Because  our  analysis  of  Alu  elements  related  to  the 
AperTs  insertion  only  included  -40%  of  the  human 
genome  (both  finished  and  draft  sequence  included), 
there  are  possibly  one  or  two  other  perfect  comple¬ 
ments  in  the  human  genome  that  have  not  yet  been 
sequenced  and  may  be  the  actual  source  gene  for  these 
elements.  The  transcriptional  potential  of  this  element 
would  be  consistent  with  its  role  as  the  potential 
source  Alu  gene.  This  confirms  the  existence  of  minor 
active  source  genes  that  differ  from  the  source  gene 
that  generated  almost  all  of  the  Alu  elements  present  in 
the  human  genome  today.  In  addition,  the  de  novo 
Apert's  Alu  element  was  also  transcriptionally  active. 
There  are  two  possible  explanations  for  this  result. 
First,  the  transcriptional  capacity  of  the  elements  was 
evaluated  by  transient  transfections  in  tissue  culture. 
This  system  does  not  reflect  the  influence  of  chromatin 
structure  and  methylation  patterns  (position  effects) 
on  the  transcription  and  presumably  retroposition 
potential  of  the  two  Alu  repeats.  Alternatively,  the 
de  novo  Apert's  Alu  element  may  have  inserted  in 
a  region  of  the  FGFR2  gene  that  fortuitously  enhanc¬ 
es  its  own  transcription  capability.  Although  further 
studies  will  be  required  to  make  more  definitive  state¬ 
ments  in  this  regard,  the  transcriptional  capability  of 
Ya5NBC237  is  consistent  with  one  of  the  many  re¬ 
quirements  a  source  gene  possesses,  making  it  a  plau¬ 
sible  candidate  source  gene  for  the  de  novo  Apert's 
insertion. 

In  summary,  the  computational  analyses  of  a  sub¬ 
set  of  recently  integrated  Alu  elements  demonstrate 
that  Alu  sequence  evolution  is  affected  by  a  number  of 
dynamic  events.  New  retroposition-competent  Alu 
source  genes,  gene  conversion,  and  genetic  instability 
each  play  an  important  role  in  Alu  sequence  evolution 
and  proliferation  within  the  human  genome. 
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METHODS 

Computational  Analyses 

Screening  of  the  GenBank  nr,  the  htgs,  and  the  gss  databases 
were  performed  by  use  of  the  Advanced  Basic  Local  Alignment 
Search  Tool  2.0  (BLAST)  (Altschul  et  al.  1990)  available  from 
the  National  Center  for  Biotechnology  Information  (http:// 
www.ncbi.nlm.nih.gov/).  For  the  Ya5  subfamily  analysis,  the 
database  was  searched  for  matches  to  the  281  bases  of  the  Ya5 
consensus  sequence  with  the  following  advanced  options: 
-e  1.0  e-120,  -b  1000,  and  -v  1000.  A  region  composed  of  500 
bases  of  flanking  DNA  sequence  directly  adjacent  to  the  se¬ 
quences  identified  from  the  databases  that  matched  the  initial 
GenBank  BLAST  query  were  subjected  to  annotation  by  use  of 
either  RepeatMasker2  from  the  University  of  Washington  Ge¬ 
nome  Center  server  (http://ftp.genome.washington.edu/cgi- 
bin/RepeatMasker)  or  Censor  from  the  Genetic  Information 
Research  Institute  (http://www.girinst.org/Censor_Server- 
Data_Entry_Forms.html)  (Jurka  et  al.  1996).  These  programs 
annotate  the  repeat  sequence  content  of  DNA  sequences  from 
humans  and  rodents.  The  sequences  were  then  subjected  to 
more  detailed  analysis  by  use  of  MegAlign  (DNAStar  version 
3.1.7  for  Windows  3.2).  The  following  parameters  were  used 
to  select  the  Ya5  elements  to  be  analyzed:  (1)  Ya5  had  to  have 
all  five  diagnostic  nucleotides  (except  for  the  first  position,  as 
it  is  a  highly  mutable  CpG).  (2)  No  truncated  Alu  elements 
were  included  in  the  analysis.  (3)  No  Alu  elements  identified 
as  a  result  of  directed  cloning  strategies  designed  to  identify 
Alu  repeats  were  included  (only  those  randomly  found  within 
larger  data  sequence).  (4)  Duplicate  Alu  elements  were  elimi¬ 
nated  on  the  basis  of  flanking  sequences.  The  consensus  se¬ 
quences  of  the  Yb8  and  Ya8  subfamilies  were  used  for  parallel 
searches  of  the  three  GenBank  databases  mentioned  above.  A 
complete  list  of  the  Alu  elements  identified  from  the  GenBank 
search  is  available  from  M.A.B.  or  P.L.D.  and  at  http:// 
www.genome.org/cgi/doi/10.1101/grl52300. 

To  search  for  putative  source  genes  of  the  Alu  elements 
that  have  been  associated  previously  with  different  diseases, 
the  three  GenBank  databases  were  searched  by  use  of  the  se¬ 
quence  of  each  individual  repeat  to  identify  exact  comple¬ 
ments  (Deininger  and  Batzer  1999). 

DNA  Samples 

Human  DNA  samples  from  the  European,  African-American, 
Egyptian,  and  Greenland  native  population  groups  were  iso¬ 
lated  from  peripheral  blood  lymphocytes  (Ausubel  et  al.  1996) 
that  were  available  from  previous  studies  (Roy  et  al.  1999). 

Oligonucleotide  Primer  Design  and  PCR 
Amplification 

A  region  composed  of  -500  bases  of  flanking  unique  DNA 
sequences  adjacent  to  each  Alu  repeat  were  used  to  design 
primers  for  14  Ya5a2  Alu  elements  (13  exact  matches  to  con¬ 
sensus,  Table  2).  PCR  primers  were  designed  with  the  Primer3 
software  (Whitehead  Institute  for  Biomedical  Research) 
(http://www.genome.wi.mit.edu/cgi-bin/primer/primer3_ 
www.cgi).  The  resultant  PCR  primers  were  screened  against 
the  GenBank  nr  database  for  the  presence  of  repetitive  ele¬ 
ments  by  use  of  the  BLAST  program,  and  primers  that  resided 
within  known  repetitive  elements  were  discarded  and  new 
primers  were  designed.  PCR  amplification  was  carried  out  in 
25-pL  reactions  with  50-100  ng  of  target  DNA,  40  pM  of  each 
oligonucleotide  primer,  200  pM  dNTPs  in  50  mM  KCl,  1.5 


mM  MgCl2,  10  mM  Tris-HCl  (pH  8.4),  and  Taq  DNA  polymer¬ 
ase  (1.25  units)  as  recommended  by  the  supplier  (Life  Tech¬ 
nologies).  Each  sample  was  subjected  to  the  following  ampli¬ 
fication  cycle:  an  initial  denaturation  of  2:30  min  at  94°C,  1 
min  of  denaturation  at  94®C,  1  min  at  the  annealing  tempera¬ 
ture,  1  min  of  extension  at  72°C,  repeated  for  32  cycles,  fol¬ 
lowed  by  a  final  extension  at  72°C  for  10  min.  Twenty  micro¬ 
liters  of  each  sample  was  fractionated  on  a  2%  agarose  gel 
with  0.25  pg/ml  ethidium  bromide.  PCR  products  were  di¬ 
rectly  visualized  by  UV  fluorescence.  The  human  genomic 
diversity  associated  with  each  element  was  determined  by  the 
amplification  of  20  individuals  from  each  of  4  populations 
(African  American,  Greenland  native,  European,  and  Egyp¬ 
tian;  160  total  chromosomes).  The  chromosomal  location  for 
elements  identified  from  randomly  sequenced  large-insert 
clones  was  determined  by  PCR  analysis  of  National  Institute 
of  General  Medical  Sciences  (NIGMS)  human/rodent  somatic 
cell  hybrid  mapping  panels  1  and  2  (Coriell  Institute  for  Medi¬ 
cal  Research,  Camden,  NJ). 

Construction  of  Plasmids 

The  following  constructs  were  made:  p‘^^^Ya5NBC237  (416 
bp  upstream  genomic  -  Alu  -  223  bases  downstream); 
p‘290Ya5Ap  (290  bp  upstream  genomic  -  Alu  -  293  bases);  and 
pNPYa5NBC237  (no  upstream  vector  flank-Alu  -  223  bases). 
Unless  otherwise  noted,  PCR  was  performed  in  20-pL  reac¬ 
tions  by  use  of  an  MJ  Research  PTC  200  thermal  cycler  with 
the  following  conditions:  IX  Promega  buffer,  1.5  mM  MgCl2, 
200  pM  dNTPs,  0.25  pM  primers,  1.5  units  of  Taq  polymerase 
(Promega)  at  94®C  for  2  min;  94°C  for  20  sec,  55°C  (annealing 
temperature)  for  20  sec,  72°C  for  1  min,  for  30  cycles;  72°  C  for 
3  min.  To  PCR  amplify  and  clone  the  864-bp  fragment  con¬ 
taining  the  de  novo  Alu  Ya5  from  Apert  syndrome  patient  1 
(accession  no.  AF097344),  the  following  primers  were  used: 
forward,  5'-GGTGTGGCCAAAGTGGAGGATGTGTAC-3'  and 
reverse,  5'-TTATTCAAGGATAAAAGGGGCCATTTC-3'  with 
an  annealing  temperature  of  50°C;  and  for  the  920-bp  frag¬ 
ment  containing  AluYa5NBC237  (accession  no.  AL031274) 
the  primers  used  were:  forward,  5 '-TTATTCCATTG 
GTCCTTTCCACCAG-3'  and  reverse,  5'-CAGGCAGGGAGG 
TACTTGTCTCTTG-3'  with  an  annealing  temperature  of  55°C. 

For  the  p^^Ya5NBC237,  PCR  amplification  from  the 
clone  was  done  with  the  same  reverse  primer  and  the  FAlu5 
primer  5  '-GGCCGGGCGCGGTGGCTCA-3 ' . 

The  final  PCR  product  of  the  complete  construct  was 
cloned  into  pGEMTeasy  Vector  System  I  (Promega).  Con¬ 
structs  were  subjected  to  DNA  sequence  analysis  to  verify 
their  sequence  context.  Purified  plasmids  from  the  constructs 
were  prepared  by  alkaline  lysis  of  bacterial  cells  followed  by 
banding  in  a  CsCl  gradient  twice.  DNA  concentrations  were 
determined  spectrophotometrically  by  use  of  A260  ^nd  veri¬ 
fied  by  visual  examination  of  ethidium  bromide-stained  aga¬ 
rose  gels. 

All!  Transcription  in  Cell  Lines  and  RNA  Analysis 

Transient  transfections  were  carried  out  in  the  rodent  cell  line 
C6  glioma  (ATCC  CCL107).  Monolayers  were  grown  to  50%- 
70%  confluency  and  transfected  with  3  pg  of  the  construct- 
containing  plasmid  and  1  pg  of  control  plasmid  (p^^^  BCl)  by 
use  of  LipofectAmine  Plus  (GIBCO  Life  Sciences)  following 
the  manufacturer's  recommended  protocol.  Total  RNA  was 
isolated  16-20  h  post-transfection. 

RNA  was  extracted  from  cell  lines  utilizing  the  Trizol  Re¬ 
agent  (Life  Technologies,  Inc.)  according  to  the  manufactur- 
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er's  protocol.  Equal  amounts  of  RNA  were  fractionated  on  a 
2%  agarose-formaldehyde  gel  and  then  transferred  to  a  nylon 
membrane,  Hybond-N  (Amersham).  Northern  blots  were  hy¬ 
bridized  utilizing  the  following  end-labeled  oligonucleotide 
probes:  unique-1  5'-TGTGTGTGCCAGTTACCTTG-3' 
(complementary  to  the  3'  end  of  the  control  plasmid)  and 
AluYA5-l  5'-ACCGTTTTAGCCGGGAATGGTC-3'  (comple¬ 
mentary  to  Ya5  Alu  RNA,  but  not  to  7SL)  in  5x  SSC,  5x 
Denhardt's,  1%  SDS,  and  100  pg/mL  herring  sperm  DNA.  Oli¬ 
gonucleotides  were  end  labeled  by  incorporating  [7-'^^P]ATP 
(Amersham)  with  T4  polynucleotide  kinase  (New  England 
BioLabs),  and  subsequently  separated  from  free  label  by  filtra¬ 
tion  through  a  Sephadex  G-50  column.  Blots  were  washed 
three  times  at  45°C  with  a  low  stringency  buffer  (2x  SSC  and 
1%  SDS)  and  subjected  to  autoradiography  or  quantified  with 
a  PujiFilm  FLA-2000  fluorescent  image  analyzer  (Fuji  Photo 
Film  Co.  LTD).  Statistical  analysis  was  performed  with  the 
Jandel  SigmaStat  Statistical  Software  Version  2,  Qandel  Cor¬ 
poration). 
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Alu  elements  have  amplified  in  primate  genomes 
through  a  RNA-dependent  mechanism,  termed  ret- 
roposition,  and  have  reached  a  copy  number  in  excess 
of  500,000  copies  per  human  genome.  These  elements 
have  been  proposed  to  have  a  nxunber  of  functions  in 
the  hiunan  genome,  and  have  certainly  had  a  major 
impact  on  genomic  architecture.  Alu  elements  con¬ 
tinue  to  amplify  at  a  rate  of  about  one  insertion  every 
200  new  births.  We  have  found  16  examples  of  dis¬ 
eases  caused  by  the  insertion  of  Alu  elements,  sug¬ 
gesting  that  they  may  contribute  to  about  0.1%  of  hu¬ 
man  genetic  disorders  by  this  mechanism.  The  large 
number  of  Alu  elements  within  primate  genomes  also 
provides  abundant  opportimities  for  unequal  homol¬ 
ogous  recombination  events.  These  events  often  oc¬ 
cur  intrachromosomally,  resulting  in  deletion  or  du¬ 
plication  of  exons  in  a  gene,  but  they  also  can  occur 
interchromosomally,  causing  more  complex  chromo¬ 
somal  abnormalities.  We  have  found  33  cases  of  germ¬ 
line  genetic  diseases  and  16  cases  of  cancer  caused  by 
unequal  homologous  recombination  between  Alu  re¬ 
peats.  We  estimate  that  this  mode  of  mutagenesis  ac¬ 
counts  for  another  0.3%  of  human  genetic  diseases. 
Between  these  different  mechanisms,  Alu  elements 
have  not  only  contributed  a  great  deal  to  the  evolu¬ 
tion  of  the  genome  but  also  continue  to  contribute  to 
a  significant  portion  of  hiunan  genetic  diseases.  ©  1999 
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THE  SPREAD  OF  Alu  ELEMENTS  IN  THE 
HUMAN  GENOME 

Alu  elements  represent  a  sequence  of  approxi¬ 
mately  300  nucleotides  (nt)  in  length  that  are  tran¬ 
scribed  by  RNA  polymerase  III.  The  RNA  transcript 
is  then  reverse-transcribed  and  inserted  into  a  new 
location  in  the  genome.  This  RNA-mediated  process 
for  making  new  copies  of  the  element  is  termed 
retroposition  (1).  Different  Alu  elements  in  the  ge¬ 
nome  are  not  identical  to  one  another.  It  appears 
that  Alu  elements  that  have  integrated  recently 
within  the  genome  are  quite  homogeneous,  and  al¬ 
most  exact  copies  of  one  another  (2).  However,  the 
older  copies  have  accumulated  random  mutations, 
making  them  typically  divergent  by  20%  or  more 
from  one  another  at  the  sequence  level  (3). 

Alu  elements  began  inserting  early  in  primate 
evolution,  approximately  65  mya  (3).  Although  there 
are  some  related  elements  in  mammals  outside  of 
the  primate  order,  they  do  not  have  the  specific 
structure  of  Alu  elements.  The  rate  of  Alu  amplifi¬ 
cation  appears  to  have  reached  a  maximum  between 
35  and  60  mya,  and  is  currently  amplifying  at  only 
1%  of  the  maximum  rate.  There  are  probably  only 
about  2000  Alus  specific  to  the  human  genome,  and 
not  found  in  chimpanzee  and  gorilla.  Thus,  about 
99.8%  of  the  500,000  Alus  in  the  human  genome  can 
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TABLE  1 


Alu  Insertions  and  Disease 


Locus 

Distribution 

Subfamily 

Disease 

Reference 

CaR 

Familial 

Ya4 

Hypocalciuric  h5rpercalcemia  and 

neonatal  severe  hyperparath5n-oidism 

(51) 

Mlvi-2 

De  novo  (somatic?) 

Ya5 

Associated  with  leukemia 

(52) 

NFl 

De  novo 

Ya5 

Neurofibromatosis 

(53) 

PROGINS 

About  50% 

Ya5 

Linked  with  ovarian  carcinoma 

(54) 

IL2RG 

Familial 

Ya5 

XSCID 

(55) 

ACE 

About  50% 

Ya5 

Linked  with  protection  from  heart 
disease 

(35) 

Factor  IX 

A  grandparent 

Ya5 

Hemophilia 

(56) 

EYAl 

De  novo 

Ya5 

Branchio-oto-renal  syndrome 

(57) 

2  X  FGFR2 

De  novo 

Ya5  &  Yb8 

Apert’s  s3mdrome 

(41) 

Cholinesterase 

One  Japanese  family 

Yb8 

Cholinesterase  deficiency 

(58) 

APC 

Familial 

Yb8 

Hereditary  desmoid  disease 

(59) 

Btk 

Familial 

Y 

X-linked  agammaglobulinaemia 

(55) 

Cl  inhibitor 

De  novo 

Y 

Complement  deficiency 

(60) 

BRCA2 

De  novo 

Y 

Breast  cancer 

(61) 

GK 

? 

Y 

Glycerol  kinase  deficiency 

(62) 

be  found  at  the  same  locus  in  all  of  the  great  apes, 
and  85%  of  the  elements  at  specific  loci  can  be  found 
in  all  monkeys.  Our  best  estimates  of  Alu  amplifica¬ 
tion  in  the  human  genome  are  that  there  is  one  new 
insert  in  about  every  200  new  births  (4).  Although 
this  is  well  below  the  peak  rate,  it  is  still  high 
enough  to  represent  a  significant  factor  in  human 
mutagenesis. 

In  addition  to  random  mutations,  which  occur  to 
Alu  elements  after  their  insertion  in  the  genome, 
there  are  specific  base  changes  that  allow  separation 
of  Alu  elements  into  different  subfamilies  (5-10). 
The  different  subfamilies  were  all  inserted  at  differ¬ 
ent  stages  of  primate  evolution.  Almost  all  of  the 
insertions  that  have  occurred  specifically  in  the  hu¬ 
man  genome  come  from  four  closely  related  subfam¬ 
ilies,  Alu  Y,  Ya5,  Ya8,  and  Yb8.  Ya5  and  Yb8  inserts 
represent  the  majority  of  the  inserts  and  Alu  Y  in¬ 
serts  are  relatively  rare.  All  of  the  new  inserts  be¬ 
long  to  a  small  group  of  the  most  recently  created 
subfamilies  (see  Table  1).  This  demonstrates  that 
only  a  small  subset  of  Alus  is  capable  of  amplifica¬ 
tion  (11). 

Several  explanations  for  the  selective  amplifica¬ 
tion  of  specific  subfamilies  have  been  proposed.  One 
likely  explanation  is  that  a  few  specific  loci  are  ca¬ 
pable  of  active  amplification,  while  almost  all  other 
loci  are  not,  and  that  there  are  almost  no  such  loci  in 
the  older  subfamilies  (11).  Alternatively,  one  has  to 
propose  that  loci  from  all  subfamilies  express,  but 
that  the  RNAs  expressed  from  the  newer  subfami¬ 


lies  interact  with  the  retroposition  apparatus  much 
better  than  the  older  subfamily  RNAs  (12,13). 

Alus  AND  LI  ELEMENTS 

The  other  major  mobile  element  in  the  human 
genome  is  the  LI  element.  Alu  elements  are  RNA 
polymerase  Ill-derived  transcripts  that  have  no  cod¬ 
ing  capacity.  Thus,  they  do  not  code  for  any  proteins 
that  might  be  involved  in  the  retroposition  process. 
LI  repeats,  on  the  other  hand,  are  much  longer  and 
have  two  open-reading  frames  (reviewed  in  (14)). 
One  open-reading  frame  apparently  codes  for  an 
RNA-binding  protein  whose  exact  function  is  un¬ 
known.  The  other  open-reading  frame  codes  for  a 
protein  that  includes  domains  for  reverse  transcrip¬ 
tase,  as  well  as  for  an  endonuclease  that  apparently 
nicks  the  genome  at  the  site  of  insertion  (15-17).  An 
assay  that  allows  rapid  LI  retroposition  in  cultured 
cells  has  been  devised  recently  (IS).  This  assay  fa¬ 
cilitates  the  dissection  of  the  details  of  the  LI  ret¬ 
roposition  mechanism. 

Alu  elements  must  obtain  the  enzymes  for  their 
retroposition  from  somewhere.  In  addition,  there  are 
striking  similarities  between  the  mechanisms  of  Alu 
and  LI  retroposition  that  make  it  very  attractive  to 
think  that  LI  elements  may  supply  the  necessary 
components  for  Alu  retroposition  (15,16,19,20).  This 
idea  is  certainly  very  attractive,  and  thus  the  rate  of 
Alu  retroposition  may  be  very  dependent  on  the  rate 
and  evolution  of  LI  elements. 


Alu  REPEATS  AND  HUMAN  DISEASE 


185 


Alu  ELEMENTS:  FUNCTIONAL  ROLE  OR  A 
PARASITE’S  PARASITE 

Alu  repeats  represent  over  5%  of  the  mass  of  the 
human  genome.  They  are  also  spread  throughout 
the  entire  genome,  at  var3dng  densities.  These  ob¬ 
servations,  along  with  other  specific  properties  of 
the  Alu  elements  have  led  to  a  number  of  hypothet¬ 
ical  functions  for  the  Alu  elements  that  might  ex¬ 
plain  their  ubiquitous  presence  in  primate  genomes. 
Some  of  the  proposed  roles  involve  an  everyday  func¬ 
tion  for  the  cell,  while  others  are  of  a  more  sporadic 
nature. 

The  first  role  ever  proposed  for  Alu  elements  was 
that  they  might  be  origins  of  DNA  replication  (21). 
This  role  is  consistent  with  their  high  copy  number 
and  dispersed  nature,  but  has  not  been  substanti¬ 
ated  by  direct  experimentation  and  seems  like  too 
important  a  function  to  be  served  by  an  element  that 
is  not  found  outside  of  primates. 

More  recently,  evidence  has  been  presented  that 
Alu  RNAs  may  stimulate  protein  translation  by  in¬ 
hibiting  a  RNA-dependent  protein  kinase,  PKR  (22- 
24).  Because  Alu  RNAs  from  many  loci  are  stimu¬ 
lated  by  a  number  of  cellular  stresses,  such  as  viral 
infection  and  heat  shock,  this  would  provide  a  mech¬ 
anism  by  which  dispersed  sequences  may  contribute 
to  a  cellular  process  as  a  group.  If  this  is  a  function 
of  Alu  elements,  then  it  is  likely  to  represent  only  a 
slightly  modified  regulation  seen  in  nonprimate  spe¬ 
cies  that  is  filled  by  other  RNAs  or  molecules  in 
those  species. 

Evidence  has  been  presented  in  yeast  that  retro- 
transposable  elements  may  aid  in  healing  chromo¬ 
somal  breaks  (25,26).  This  suggests  the  possibility 
that  Alu  and  LI  elements  may  provide  the  same  role 
in  the  human  genome. 

There  are  several  thoughts  concerning  the  possi¬ 
ble  roles  of  Alu  elements  in  the  evolution  of  the 
human  genome.  As  discussed  below,  Alu  elements 
can  lead  to  unequal  recombination  that  results  in 
deletion  or  duplication  of  sequences.  These  events 
could  allow  duplication  of  exons  and  therefore  for¬ 
mation  of  new  protein  variants.  They  can  also  con¬ 
tribute  to  interchromosomal  recombination  that 
may  lead  to  cytogenetic  alterations  that  are  involved 
in  human  speciation. 

There  are  also  several  ways  in  which  Alu  re¬ 
peats  have  been  proposed  to  influence  the  evolu¬ 
tion  of  gene  expression.  Because  Alu  elements  are 
rich  in  CpG  dinucleotides  that  represent  the  sub¬ 
strate  for  genomic  methylation,  Alu  elements  rep¬ 


resent  CpG-rich  islands  that  make  up  about  30% 
of  the  methylation  sites  in  the  human  genome 
(24).  When  an  Alu  element  inserts  in  a  new  loca¬ 
tion  in  the  genome,  it  introduces  a  CpG  island  at 
that  new  location.  CpG  islands  have  been  associ¬ 
ated  with  gene  regulation,  as  well  as  imprinting  of 
genes,  and  therefore  Alu  elements  may  contribute 
to  the  evolution  of  gene  expression  and  imprinting 
in  the  human  genome.  In  addition,  Alu  elements 
have  been  found  to  carry  functional  promoter  ele¬ 
ments  for  several  of  the  steroid  hormone  receptors 
(27,28).  Thus,  insertion  of  a  new  Alu  element  in 
the  vicinity  of  a  gene  may  introduce  new  tran¬ 
scription  factor-binding  sites  that  could  alter  the 
regulation  of  gene  expression.  There  are  a  number 
of  cases  where  elements  that  influence  gene  ex¬ 
pression  have  been  mapped  to  within  an  Alu  re¬ 
peat  (29),  demonstrating  that  the  introduction  of 
these  sequences  can  at  least  occasionally  contrib¬ 
ute  to  gene  expression  and  regulation. 

Although,  there  are  numerous  cases  where  indi¬ 
vidual  Alu  elements  have  had  a  positive  impact  on 
the  human  genome,  it  might  be  argued  that  none 
of  them  has  been  confirmed  as  a  function.  In  this 
sense  we  would  not  define  something  that  happens 
in  a  positive  sense  every  few  thousand  years  as 
being  a  function,  because  it  would  be  occurring  too 
sporadically  to  apply  a  positive  selection  for  the 
presence  of  Alu  elements.  In  addition,  studies  of 
individual  Alu  elements  demonstrate  that  there  is 
essentially  no  selective  pressure  on  any  given  Alu 
repeat,  although  it  is  possible  that  selection  does 
exist  for  a  handful  of  master  elements.  Thus,  it 
has  been  argued  that  Alu  and  LI  elements  may 
both  represent  ^‘selfish”  DNA,  or  DNA  that  is  only 
working  to  replicate  itself.  Selfish  DNA  may  often 
have  negative  impacts  on  the  host,  but  can  be 
tolerated  if  it  does  not  have  too  strong  an  adverse 
affect.  Selfish  DNA  may  also  occasionally  have 
positive  benefits,  but  only  by  chance,  and  not  by 
functional  design.  If  LI  elements  are  essentially  a 
parasite  within  the  human  genome,  and  if  Alu 
relies  on  LI  elements  for  their  amplification  pro¬ 
cess,  then  one  might  describe  Alu  as  a  “parasite’s 
parasite.” 

Alus  AS  MARKERS  FOR  HUMAN 
DIVERSITY 

Although  there  is  still  a  question  as  to  whether 
there  is  a  true  functional  role  for  Alu  elements  in  the 
human  genome,  Alu  elements  have  proved  to  be 
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useful  in  studies  of  human  DNA.  The  presence  of 
Alu  repeats  located  ubiquitously  throughout  the  hu¬ 
man  genome,  but  not  in  nonprimate  species,  has 
allowed  detection  of  human  DNA  sequences  that 
have  been  transfected  into  the  cells  of  other  organ¬ 
isms,  such  as  mice.  This  has  been  useful  in  marker- 
rescue  experiments  in  isolating  a  number  of  genes, 
including  the  first  examples  of  oncogenes  isolated  by 
transforming  rodent  cell  lines  with  human  tumor 
DNAs  (30).  More  recently,  inter-Alu  PCR  (31,32)  has 
found  a  broad  range  of  uses  in  isolating  specific 
human  DNA  regions  from  mouse/human  hybrid  cell 
lines  and  other  complex  sources  containing  large 
segments  of  human  DNA. 

Recent  Alu  insertions  have  also  proven  useful  in  a 
number  of  human  population  studies.  In  particular, 
there  are  over  1000  Alu  insertions  that  occurred 
recently  enough  to  be  present  only  in  a  subset  of 
human  chromosomes.  Because  there  does  not  seem 
to  be  any  specific  mechanism  for  removing  Alu  ele¬ 
ments  from  the  genome,  once  inserted  they  make  a 
very  stable  genetic  marker  (33,34).  This  observation, 
along  with  the  extremely  low  probability  that  any 
two  recently  integrated  elements  have  inserted  in¬ 
dependently  in  the  same  chromosomal  location, 
makes  Alu  insertions  one  of  the  best  identical-by¬ 
descent  (IBD)  markers  for  human  evolution  studies. 
Any  two  individuals  sharing  an  Alu  insert  almost 
certainly  do  so  because  they  share  a  common  ances¬ 
tor  in  which  the  insertion  occurred.  Table  1  includes 
an  example  of  an  Alu  insertion  in  the  angiotensin¬ 
converting  enzyme  (ACE)  locus  that  shows  a  useful 
association  with  protective  advantages  from  heart 
disease  (35).  Many  other  Alu  insertion  polymor¬ 
phisms  have  been  identified  either  in  random 
genomic  loci  or  in  specific  genes,  but  without  any 
known  disease  association.  These  Alu  insertions  are 
easy  to  assay  for  their  presence  or  absence  in  a 
chromosomal  location  and  have  been  found  to  be 
very  powerful  markers  for  human  forensic  and  mo¬ 
lecular  anthropology  studies  (36,37). 

RETROPOSITION  OF  Alu  ELEMENTS 
AND  DISEASE 

Alu  elements  are  located  throughout  the  genome 
and  in  almost  any  location  within  a  gene  except 
those  in  which  they  would  totally  disrupt  the  func¬ 
tion  of  that  gene.  Figure  1  illustrates  some  of  the 
positions  relative  to  a  typical  gene  structure  in 
which  Alu  may  land.  Alus  landing  far  enough  up¬ 
stream  of  a  gene  may  have  no  influence  on  that 


gene’s  expression.  However,  Alus  landing  in  or  near 
the  promoter/enhancer  regions  of  a  gene  have  been 
found  to  influence  the  expression  of  specific  genes 
(reviewed  in  (29)),  as  well  as  to  have  the  general 
potential  to  add  transcription  elements,  like  steroid 
hormone  receptor  elements  (27,28),  to  the  upstream 
gene  region. 

Very  few  Alu  elements  are  found  within  the  5' 
noncoding  or  coding  regions  of  exons,  presumably 
because  insertions  in  those  locations  are  too  disrup¬ 
tive  to  gene  function.  There  are  a  number  of  in¬ 
stances  where  Alu  elements  have  been  found  to  be 
part  of  the  region  coding  for  the  carboxy-terminus  of 
a  protein  product  (38,39).  Presumably  these  Alus 
insert  far  enough  downstream  in  the  coding  se¬ 
quence  to  result  in  a  new  carboxy-terminus  that 
does  not  disrupt  the  structure  of  the  protein. 

Insertions  into  the  3'  noncoding  regions  of  genes 
are  found  commonly  and  appear  to  have  few  nega¬ 
tive  affects.  Similarly  Alus  are  commonly  found  in 
introns,  demonstrating  that  Alu  insertions  in  much 
of  the  intronic  region  do  not  alter  gene  function 
significantly. 

The  vast  majority  of  Alu  insertions  that  have  led 
to  human  disease  insert  into  coding  exons,  or  into 
introns  relatively  near  an  exon  and  presumably  al¬ 
ter  splicing.  Table  1  is  a  list  of  the  genetic  defects 
that  are  thought  to  be  caused  by  Alu  insertion 
events.  Not  all  of  these  cases  have  been  demon¬ 
strated  to  be  directly  causative  for  the  disease,  but 
the  rarity  of  Alu  insertion  events,  coupled  with  the 
lack  of  other  detectable  mutations  in  these  cases, 
strongly  indicates  that  these  are  the  causative 
events.  The  ACE  insertion  (35,40)  is  likely  to  be  one 
example,  however,  that  shows  association  with  dis¬ 
ease,  but  is  highly  unlikely  to  be  the  causative  event. 

The  above  examples  demonstrate  that  Alu  inser¬ 
tions  are  capable  of  causing  genetic  defects  which 
lead  to  human  disease.  Examples  of  this  type  are 
being  found  at  an  increasing  frequency  as  the  tools 
for  genetic  analysis  allow  more  mutations  to  be  de¬ 
tected.  Finding  16  Alu-based  insertion  mutations  in 
the  Human  Genetic  Mutation  Database  that  con¬ 
tains  14374  characterized  human  mutations  sug¬ 
gests  that  Alu  elements  contribute  to  approximately 
0.1%  of  human  genetic  diseases.  This  number  agrees 
well  with  a  previous  calculation  based  on  a  similar 
dataset  of  mutations  where  Alu  and  LI  insertions 
were  estimated  to  each  contribute  approximately 
0.075%  of  human  mutations  (16).  In  some  cases,  the 
insertional  mutagenesis  may  make  detection  of  mu¬ 
tations  easier,  biasing  the  results  in  favor  of  the 
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FIG.  1.  Schematic  of  Alu-induced  damage  to  the  human  genome.  Panel  A  illustrates  some  of  the  potential  consequences  of  insertion 
of  a  new  element  in  the  vicinity  of  a  gene.  The  colored  boxes  represent  various  exons  of  the  gene.  The  red  arrows  show  existing  Alu 
elements  oriented  in  different  directions  in  the  introns  of  the  gene.  Depending  on  the  site  of  insertion,  the  Alu  element  has  varied 
probability  of  impact  on  the  genome  as  shown.  Panel  B  illustrates  an  unequal,  homologous  recombination  occurring  between  two  Alu 
elements  in  different  introns  of  a  gene.  The  arrows  broken  by  dotted  lines  show  the  path  of  the  recombination  event.  The  genes  below  show 
that  one  copy  will  have  a  deletion  while  the  other  will  duplicate  gene  sequences.  Either  is  likely  to  be  deleterious. 


detection  of  Alu  insertions.  However,  many  muta¬ 
tion  detection  strategies  are  designed  to  identify 
point  mutations,  particularly  in  coding  regions,  and 
may  overlook  insertions,  particularly  if  they  occur  in 
introns.  In  addition,  many  new  mobile  element  in¬ 
sertions  may  be  lethal  during  embryogenesis.  There¬ 
fore,  it  is  likely  that  these  estimates  of  insertion 
frequencies  are  underestimates  of  the  true  contribu¬ 
tion  of  new  Alu  insertions  to  human  disease. 

We  expect  that  with  increasing  study  of  muta¬ 
tions,  it  will  be  found  that  some  genetic  diseases  are 


more  likely  than  others  to  result  from  retroposon 
insertion.  It  has  certainly  been  observed  that  some 
genes  have  a  much  higher  Alu  repeat  content,  mak¬ 
ing  it  reasonable  that  they  will  have  a  higher  fre¬ 
quency  of  disabling  Alu  insertions.  It  has  been  ob¬ 
served  that  2  out  of  258  mutations  in  the  FGFR2 
gene  were  caused  by  Alu  insertions  (41).  This  is  the 
first  case  of  multiple  Alu  insertion  mutations  being 
detected  associated  with  a  single  disease,  suggesting 
that  this  genetic  locus  may  be  more  susceptible  to 
retroposon  insertions  than  other  regions  of  the  ge- 
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TABLE  2 

Alu/Alu  Recombination  and  Germ-Line  Disease 


Locus 

Distribution 

Disease 

Reference 

8  X  LDLR 

Kindreds 

Hypercholesterolemia 

(63-67) 

5  X  a-globin 

Kindreds 

a-thalassaemia 

(68-71) 

5  X  Cl  inhibitor 

Kindred 

Angioneurotic  adema 

(60,72) 

Lys  Hydrox. 

Kindreds 

Ehlers-Danlos  syndrome 

(73) 

DMD 

Kindred 

Duchenne’s  muscular  dystropy 

(74) 

ADA 

One  patient 

ADA  deficiency-SCID 

(75) 

Apo  B 

One  patient 

Hypo-betalipoproteinemia 

(76) 

Ins.  Rec,  /3 

One  patient 

Insulin-independent  diabetes 

(77) 

a-gal  A 

One  patient 

Fabry  disease 

(78) 

HPRT 

One  patient 

Lesch-Nyhan  syndrome 

(79) 

Plat.  Fibrinogen  Receptor 

Kindred 

Glanzmann  thrombasthenia 

(80) 

Phosphorylase  kinase 

One  patient 

Glycogen  storage  disease 

(81) 

GALNS 

One  patient 

Mucopolysaccharidosis  type  IVA 

(82) 

Antithrombin 

One  patient 

Thrombophilia 

(83) 

XY 

One  patient 

XX  male 

(84) 

/3-HEXA 

Classic  form  of  disease 

Tay  Sachs 

(85) 

C3 

Kindred 

C3  deficiency 

(86) 

HEXB 

27%  of  patients 

Sandhoffs  disease 

(87) 

nome.  However,  the  number  of  insertions  found  so 
far  is  still  fairly  low  making  more  definitive  conclu¬ 
sions  difficult. 

RECOMBINATION  BETWEEN  Alu 
ELEMENTS  ASSOCIATED  WITH  DISEASE 

In  addition  to  the  potential  impact  of  Alu  element 
insertions  in  causing  human  disease,  their  disper¬ 
sion  throughout  the  genome  provides  ample  oppor¬ 
tunity  for  unequal  homologous  recombination  which 
leads  to  a  much  higher  level  of  mutations.  Figure  IB 
illustrates  how  this  unequal  recombination  can 
cause  insertion  or  deletion  mutations.  When  recom¬ 
bination  occurs  between  Alu  elements  on  the  same 
chromosome,  the  result  is  that  there  is  either  dupli¬ 
cation  or  deletion  of  the  sequences  between  the  Alus. 
Recombination  may  also  occur  between  Alu  ele¬ 
ments  on  different  chromosomes,  resulting  in  chro¬ 
mosomal  translocations  or  more  complex  chromo¬ 
somal  rearrangements. 

Table  2  presents  a  compilation  of  Alu/Alu  recom¬ 
bination  events  that  have  contributed  to  germ-line 
disease  with  Alu-based  recombination  events  asso¬ 
ciated  with  cancer  shown  in  Table  3.  There  are  many 
more  recombination  than  insertion  events  contrib¬ 
uting  to  disease  and  the  table  of  recombination 
events  is  not  intended  to  be  exhaustive  in  presenting 
all  of  the  Alu/Alu  recombinations  that  have  contrib¬ 
uted  to  human  disease.  In  addition,  there  are  many 


recombination  events  that  occurred  between  an  Alu 
element  and  some  other  non-Alu-related  sequence 
which  may  have  been  influenced  by  the  presence  of 
the  Alu  element  (42).  Although  single  Alu  elements 
may  contribute  specifically  to  such  recombination 
events,  we  have  made  no  efforts  to  collect  those  data. 
The  mutations  resulting  from  Alu/Alu  recombina¬ 
tion  include  33  mutations  that  are  the  result  of 
germ-line  recombination  and  16  mutations  that  are 
the  result  of  somatic  events  that  led  to  cancer.  Based 
on  the  calculations  in  the  previous  section,  the  germ¬ 
line  recombination  mutants  would  represent  about 
0.3%  of  mutants  characterized.  We  expect  that  this 
number  is  an  underestimate  as  mutation  schemes 
aimed  at  detecting  point  mutants  would  often  be 
expected  to  overlook  large  duplication  and  deletion 
events,  and  we  have  probably  not  reported  all  known 
Alu/Alu  recombinations  in  the  tables. 

The  data  in  Tables  2  and  3  show  that  Alu/Alu 
recombination  events  are  highly  biased  towards  spe¬ 
cific  genes.  The  first  to  show  evidence  for  this  was 
the  LDLR  gene,  which  has  at  least  eight  indepen¬ 
dent  cases.  It  was  also  reported  that  these  recombi¬ 
nation  events  appeared  to  take  place  in  a  preferred 
location  within  the  Alu  element  (42,43).  These  data 
suggested  that  Alu  elements  may  represent  hot 
spots  for  recombination  by  a  mechanism  that  was 
more  than  simple  homologous  recombination.  Mul¬ 
tiple  Alu/Alu  recombination  events  have  also  oc¬ 
curred  in  the  germ  line  involving  two  other  genes. 
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TABLE  3 

Alu/Alu  Recombination  and  Cancer 


Locus 

Distribution 

Disease 

Reference 

10  X  ALL-1  (MLL) 

Somatic 

Acute  myelogenous  leukemia 

(88-90) 

2  X  BRCAl 

Somatic  and  kindreds 

Breast  cancer 

(91,92) 

MLHl 

Two  kindreds 

HNPCC 

(93) 

TRE 

Somatic 

Ewing's  sarcoma 

(94) 

RB 

Common 

Association  with  glioma 

(95) 

EWS 

Subset  of  Africans 

Protective  against  Ewing  sarcoma? 

(96) 

Even  more  striking  is  the  preferential  recombina¬ 
tion  seen  in  somatic  recombination.  The  All-1  gene 
which  participates  in  a  high  proportion  of  acute  leu¬ 
kemias  is  another  hotspot  for  Alu/Alu  recombina¬ 
tion.  This  includes  intragenic  recombination  which 
is  the  major  cause  of  acute  myelogenous  leukemia  in 
individuals  without  a  cytogenetic  defect,  as  well  as  a 
possible  contribution  to  recombination  between  the 
All-1  gene  and  other  chromosomal  loci  in  causing 
more  complex  cytogenetic  defects  associated  with 
leukemia  (44-46). 

The  genes  that  show  high  levels  of  Alu/Alu  recom¬ 
bination  tend  to  have  a  large  number  of  Alu  se¬ 
quences.  Although  Alu  density  may  help  contribute 
to  this  recombination,  the  correlation  does  not  seem 
to  hold  up  upon  analysis  of  other  Alu-rich  genes. 
Therefore,  it  seems  likely  that  some  other  factor 
contributes  to  the  high  recombination  rates  seen  in 
these  genes  and  that  the  Alu  elements  are  likely  to 
help  in  that  process  rather  than  to  be  the  primary 
cause. 

It  has  generally  been  found  that  longer  stretches 
of  sequence  identity  allow  more  efficient  homologous 
recombination  and  that  300  bp  of  imperfect  se¬ 
quence  identity  would  represent  a  relatively  ineffi¬ 
cient  target  (47).  Therefore,  as  Alu  elements  accu¬ 
mulate  random  mutations  after  integration  in  the 
genome  their  recombination  potential  gradually  de¬ 
creases.  Thus,  early  in  primate  evolution  when  a 
high  proportion  of  Alu  elements  were  closer  matches 
to  one  another,  Alu/Alu  recombination  may  have 
contributed  even  more  to  the  evolution  and  reshap¬ 
ing  of  primate  genomes. 

Based  on  the  above  considerations,  one  might  ex¬ 
pect  the  much  longer  LI  family  of  elements  to  con¬ 
tribute  significantly  to  recombination,  as  well.  Sur¬ 
prisingly,  we  are  familiar  with  only  two  Ll/Ll 
recombination  events  in  the  human  genome  (48). 
Therefore,  it  would  appear  that:  (1)  LI  elements  are 
located  in  less  recombinogenic  regions  of  the  human 


genome;  (2)  the  approximately  10-fold  lower  copy 
number  of  LI  elements  is  more  than  enough  to  offset 
their  larger  size  in  terms  of  probabilities  of  recom¬ 
bination;  (3)  some  basic  property  of  the  Alu  elements 
themselves  makes  them  recombinogenic;  or  (4)  the 
larger  average  spacing  between  LI  elements  causes 
the  vast  majority  of  Ll/Ll  recombination  events  to 
be  lethal.  It  is  possible  that  all  of  these  factors  may 
contribute  to  this  observed  difference.  Transient 
transfection  experiments  suggest  that  the  third  pos¬ 
sibility  may  not  be  true  since  Alu  sequences  did  not 
recombine  more  frequently  than  other  control  se¬ 
quences  (49).  However,  in  their  native  chromatin 
environment,  or  in  specific  cell  types  or  cell  stimuli 
in  vivo,  Alus  may  still  respond  with  higher  recombi¬ 
nation  rates.  We  believe  that  the  fourth  possibility 
may  be  the  dominant  factor,  however.  The  vast  ma¬ 
jority  of  Alu/Alu  recombination  events  listed  in  the 
tables  represent  recombination  between  Alu  ele¬ 
ments  within  the  same  gene.  This  limits  the  effect  of 
the  recombination  to  a  single  gene  defect.  With  their 
lower  copy  number  and  tendency  to  be  located  be¬ 
tween  genes  rather  than  in  genes,  Ll/Ll  recombina¬ 
tion  events  are  likely  either  to  involve  only  inter- 
genic  regions  or  to  involve  a  much  larger  region  that 
may  cause  defects  in  several  genes  simultaneously, 
resulting  in  loss  of  viability. 

There  is  growing  evidence  that  repetitive  DNAs 
contribute  to  disease  either  through  the  mutations 
they  cause  during  the  retroposition  process  that 
forms  them  (16,50)  or  through  recombination  pro¬ 
cesses  involving  unequal  cross-overs  of  repetitive 
elements.  These  recombination  events  may  involve 
repetitive  sequences  of  various  repetition  frequen¬ 
cies  with  the  likelihood  that  longer  and  more  perfect 
repeats  that  are  near  one  another  probably  recom¬ 
bine  well,  while  short,  mismatched  repeats  (like  Alu) 
recombine  relatively  poorly.  However,  the  extremely 
high  copy  number  of  Alu  elements  makes  them  a 
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major  factor  in  the  molecular  basis  of  human  dis¬ 
eases. 
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Abstract 

Alu  elements  undergo  amplification  through  retroposition  and  integration  into  new  locations  throughout  primate 
genomes.  Over  500,000  Alu  elements  reside  in  the  human  genome,  making  the  identification  of  newly  inserted  Alu 
repeats  the  genomic  equivalent  of  finding  needles  in  the  haystack.  Here,  we  present  two  complementary  methods 
for  rapid  detection  of  newly  integrated  Alu  elements.  In  the  first  approach  we  employ  computational  biology  to 
mine  the  human  genomic  DNA  sequence  databases  in  order  to  identify  recently  integrated  Alu  elements.  The 
second  method  is  based  on  an  anchor-PCR  technique  which  we  term  Allele-Specific  Alu  PCR  (ASAP).  In  this 
approach,  Alu  elements  are  selectively  amplified  from  anchored  DNA  generating  a  display  or  ‘fingerprint’  of 
recently  integrated  Alu  elements.  Alu  insertion  polymoiphisms  are  then  detected  by  comparison  of  the  DNA 
fingerprints  generated  from  different  samples.  Here,  we  explore  the  utility  of  these  methods  by  applying  them 
to  the  identification  of  members  of  the  smallest  previously  identified  subfamily  of  Alu  repeats  in  the  human 
genome  termed  Ya8.  This  subfamily  of  Alu  repeats  is  composed  of  about  50  elements  within  the  human  genome. 
Approximately  50%  of  the  Ya8  Alu  family  members  have  inserted  in  the  human  genome  so  recently  that  they  are 
polymorphic,  making  them  useful  markers  for  the  study  of  human  evolution. 


Introduction 

Alu  repeats  are  the  most  successful  class  of  mo¬ 
bile  elements  in  the  human  genome.  Alu  elements 
spread  through  the  genome  via  an  RNA  mediated 
amplification  mechanism  termed  retroposition  and  re¬ 
viewed  in  Deininger  and  Batzer,  1993.  There  are  over 
500,000  Alu  elements  in  the  human  genome,  which 
have  clearly  played  a  major  role  in  sculpting  and/or 
damaging  the  genome.  Alu  elements  have  contrib¬ 
uted  to  genetic  disease,  both  by  the  disruption  of 
genes  through  the  insertion  of  newly  retroposed  ele¬ 


ments  and  by  recombination  between  Alu  elements 
(reviewed  in  Deininger  &  Batzer,  1999).  Previous 
estimates  indicate  that  retroposition  of  Alu  elements 
contributes  to  approximately  0.1%  of  human  genetic 
diseases  and  recombination  between  Alu  repeats  con¬ 
tributes  to  another  0.3%  of  genetic  diseases  (Deininger 
&  Batzer,  1999).  Therefore,  the  spread  of  the  Alu 
family  of  mobile  elements  has  generated  a  significant 
amount  of  human  genomic  variation  as  well  as  dis¬ 
eases  through  recombination-based  fluidity  as  well  as 
insertional  mutagenesis. 
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Alu  repeats  are  distributed  rather  haphazardly 
throughout  the  human  genome.  Alu  elements  began 
expanding  in  the  ancestral  primate  genomes  about  65 
mya  (Shen,  Batzer  &  Deninger,  1991)  reaching  a 
peak  amplification  between  35  and  60  mya.  Presently, 
Alu  elements  amplify  at  a  rate  that  is  100  fold  lower 
than  their  peak  rate,  with  an  estimate  of  one  new  Alu 
insert  in  every  100-200  births  (Deininger  &  Batzer, 
1993,  1995).  Evolutionary  studies  have  demonstrated 
that  the  majority  of  evolutionarily  recent  Alu  inserts 
have  specific  diagnostic  sequence  mutations  (Dein¬ 
inger  &  Batzer,  1993,  1995).  These  mutations  have 
accumulated  in  Alu  elements  throughout  primate  evol¬ 
ution  resulting  in  a  hierarchical  subfamily  structure,  or 
lineage,  of  Alu  repeats.  The  mutations  facilitate  the 
classification  of  Alu  elements  into  different  subfamil¬ 
ies,  or  clades,  of  related  elements  that  share  common 
diagnostic  mutations  (reviewed  in  Batzer,  Schmid  & 
Deninger,  1993;  Batzer  &  Deininger,  1991;  Batzer 
et  al.,  1996a).  Almost  all  of  the  recently  integrated  Alu 
elements  within  the  human  genome  belong  to  one  of 
four  closely  related  subfamilies:  Y,  Ya5,  Ya8,  and  Yb8, 
with  the  majority  being  Ya5  and  Yb8  subfamily  mem¬ 
bers.  Collectively,  these  subfamilies  of  Alu  elements 
comprise  less  than  10%  of  the  Alu  elements  present 
within  the  human  genome  with  the  Ya5/8  and  Yb8 
subfamilies  collectively  accounting  for  less  than  half 
of  a  percent  of  all  Alu  elements.  These  evolutionarily 
recent  Alu  insertions  are  useful  for  human  population 
studies,  since  there  appears  to  be  no  specific  mechan¬ 
ism  to  remove  newly  inserted  Alu  repeats,  and  the  Alu 
elements  are  identical  by  descent  with  a  known  ances¬ 
tral  state  (Batzer  et  al.,  1991,  1994a,  1996a;  Stoneking 
et  al.,  1997;  Perna  et  al.,  1992). 

Previously,  it  has  been  technically  impossible  to 
determine  the  full  impact  of  mobile  elements  on  the 
human  genome.  The  identification  of  newly  inser¬ 
ted  Alu  elements  has  been  very  difficult  due  to  the 
complexity  of  detecting  one  new  Alu  insertion  in  a 
cell  that  already  has  500,000  pre-existing  Alu  ele¬ 
ments.  We  have  previously  utilized  laborious  library 
screening  and  sequencing  strategies  to  isolate  relat¬ 
ively  small  numbers  of  Alu  insertion  polymorphisms 
(Arcot  et  al.,  1995a,  b,  c;  Batzer  &  Deininger  1991a; 
Batzer  etal.,  1990,  1991b;  1995),  as  well  as  investigat¬ 
ing  rare  300  bp  restriction  fragment  length  polymorph¬ 
isms  (Kass  et  al.,  1994).  This  makes  these  studies 
the  genomic  equivalent  of  the  search  for  needles  in 
the  haystack.  In  this  paper,  we  discuss  two  altern¬ 
ative  methods  that  overcome  the  inherent  difficulties 
in  these  experiments,  making  these  studies  manage¬ 


able.  First,  the  availability  of  large  quantities  of  hu¬ 
man  genomic  DNA  sequence  provided  by  the  Human 
Genome  Project  facilitates  genomic  database  mining 
for  recently  integrated  Alu  elements.  This  approach 
should  prove  useful  in  determining  the  chromosome- 
specific  and  genome  wide  dispersal  patterns  of  mo¬ 
bile  elements,  as  well  as  for  the  identification  of 
polymorphic  mobile  element  fossils  to  apply  to  the 
study  of  human  population  genetics  and  primate  com¬ 
parative  genomics.  Secondly,  we  have  developed  a 
PCR-based  method  that  we  term  Allele-Specific  Alu 
PCR  (ASAP).  This  technique  allows  us  to  take  ad¬ 
vantage  of  the  subfamily-specific  diagnostic  mutations 
within  Alu  mobile  elements  to  isolate  and  display 
recently  integrated  Alu  repeats  from  different  DNA 
samples,  allowing  for  direct  comparisons  of  the  Alu 
content  of  different  genomes  or  different  cells  from  an 
individual. 

Materials  and  methods 

Cell  lines  and  DNA  samples 

The  cell  lines  used  to  isolate  human  DNA  samples 
were  as  follows:  human  {Homo  sapiens),  HeLa 
(ATCC  CCL2);  chimpanzee  {Pan  troglodytes),  Wes 
(ATCC  CRL1609),  gorilla  {Gorilla  gorilla),  Ggo-1 
(primary  gorilla  fibroblasts)  provided  by  Dr.  Stephen 
J.  O’Brien,  National  Cancer  Institute,  Frederick,  MD, 
USA.  Cell  lines  were  maintained  as  directed  by  the 
source  and  DNA  isolations  were  performed  using  Wiz¬ 
ard  genomic  DNA  purification  (Promega).  Human 
DNA  samples  from  the  European,  African  Amer¬ 
ican  and  Greenland  native  population  groups  were 
isolated  from  peripheral  blood  lymphocytes  (Ausubel 
et  al,  1996)  that  were  available  from  previous  stud¬ 
ies  (Stoneking  et  al.,  1997).  Egyptian  samples  were 
collected  from  throughout  the  Nile  river  valley  region 
and  DNA  from  peripheral  lymphocytes  was  prepared 
using  Wizard  genomic  DNA  purification  kits  (Pro¬ 
mega).  Human  DNA  used  for  ASAP  was  isolated  from 
peripheral  lymphocytes  utilizing  the  super-quick  gene 
method  (Analytical  Genetic  Testing  Center). 

Computational  analyses 

A  schematic  overview  summarizing  the  computational 
analyses  of  recently  integrated  Alu  elements  is  shown 
in  Figure  1.  Initial  screening  of  the  GenBank  non- 
redundant  and  high  throughput  genomic  sequence 
(HTGS)  databases  was  performed  using  the  basic  local 
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Figure  1.  Computational  analysis  of  repetitive  elements.  The  flow 
chart  shows  the  computational  tools  utilized  for  the  identifica¬ 
tion  and  analysis  of  recently  integrated  Ya8  Alu  family  members. 
The  process  begins  with  BLAST  searches  of  the  non-redundant 
and  high-throughput  genomic  sequence  databases.  Subsequently 
sequences  (about  1000  nucleotides)  adjacent  to  the  matches  with 
100%  identity  to  the  query  sequence  are  annotated  using  the  Repeat- 
Masker2  or  Censor  server.  Following  sequence  annotation,  oligo¬ 
nucleotide  primers  complementary  to  the  unique  DNA  sequences 
adjacent  to  each  element  are  designed  using  the  Primer3  web  server. 
The  oligonucleotides  designed  using  Primer3  are  then  subjected  to  a 
second  BLAST  search  to  determine  if  they  reside  in  other  repetitive 
elements,  and  subsequently  they  are  used  for  PCR  based  analyses  of 
individual  mobile  elements. 

alignment  search  tool  (BLAST)  (Altschul  et  aL,  1990) 
available  from  the  National  Center  for  Biotechno¬ 
logy  Information  (http://www.ncbi.nlm.nih.gov/).  The 
database  was  searched  for  exact  complements  to  the 
oligonucleotide  5'-ACTAAAACTACAAAAAATAG- 
3'  that  is  an  exact  match  to  a  portion  of  the  Alu 
Ya8  subfamily  consensus  sequence  containing  unique 
diagnostic  mutations.  Sequences  that  were  exact  com¬ 
plements  to  the  oligonucleotide  were  then  subjec¬ 
ted  to  more  detailed  annotation.  A  region  composed 
of  1000  bases  of  flanking  DNA  sequence  directly 
adjacent  to  the  sequences  identified  from  the  data¬ 
bases  that  matched  the  initial  GenBank  BLAST  query 
were  subjected  to  annotation  using  either  Repeat- 
Masker2  from  the  University  of  Washington  Genome 
Center  server  (http://ftp.genome.washington.edu/cgi- 
bin/RepeatMasker)  or  Censor  from  the  Genetic  In¬ 
formation  Research  Institute  (http://www.girinst.org/ 
Censor_Server-Data_Entry_Form_s.html)  (Jurka  et  al., 
1996).  These  programs  annotate  the  repeat  sequence 
content  of  DNA  sequences  from  humans  and  rodents. 

Primer  design  and  PCR  amplification 

PCR  primers  were  designed  from  flanking  unique 
DNA  sequences  adjacent  to  individual  Ya8  Alu  ele¬ 
ments  using  the  Primer3  software  (Whitehead  In¬ 
stitute  for  Biomedical  Research,  Cambridge,  MA, 
USA)  (http://www.genome.wi.mit.edu/cgi-bin/primer 
/primer3_www.cgi).  The  resultant  PCR  primers  were 
screened  against  the  GenBank  non-redundant  data¬ 


base  for  the  presence  of  repetitive  elements  using 
the  BLAST  program,  and  primers  that  resided  within 
known  repetitive  elements  were  discarded  and  new 
primers  were  designed.  PCR  amplification  was  car¬ 
ried  out  in  25  p.1  reactions  using  50-100  ng  of  target 
DNA,  40  pM  of  each  oligonucleotide  primer,  200  fiM 
dNTPs  in  50  mM  KCl,  1.5  mM  MgCh,  10 mM  Tris- 
HCl  pH  8.4  and  Taq®  DNA  polymerase  (1.25 U)  as 
recommended  by  the  supplier  (Life  Technologies). 
Each  sample  was  subjected  to  the  following  ampli¬ 
fication  cycle:  an  initial  denaturation  of  2:30  min  at 
94°C,  1  min  of  denaturation  at  94°C,  1  min  at  the 
annealing  temperature,  1  min  of  extension  at  72°C, 
repeated  for  32  cycles,  followed  by  a  final  extension 
at  72°C  for  10  min.  Twenty  microliters  of  each  sample 
was  fractionated  on  a  2%  agarose  gel  with  0.25  fxg/ml 
ethidium  bromide.  PCR  products  were  directly  visu¬ 
alized  using  UV  fluorescence.  The  sequences  of  the 
oligonucleotide  primers,  annealing  temperatures,  PCR 
product  sizes  and  chromosomal  locations  are  shown  in 
Table  1 .  Phylogenetic  analysis  of  all  the  Alu  elements 
listed  in  Table  1  was  determined  by  PCR  amplifica¬ 
tion  of  human  and  non-human  primate  DNA  samples. 
The  human  genomic  diversity  associated  with  each 
element  was  determined  by  the  amplification  of  20 
individuals  from  each  of  four  populations  (African- 
American,  Greenland  Native,  European  and  Egyptian) 
(160  total  chromosomes).  The  chromosomal  location 
of  Alu  repeats  identified  from  clones  that  had  not  been 
previously  mapped  was  determined  by  PCR  amplifica¬ 
tion  of  National  Institute  of  General  Medical  Sciences 
(NIGMS)  human/rodent  somatic  cell  hybrid  mapping 
panel  2  (Coriell  Institute  for  Medical  Research,  Cam¬ 
den,  NJ). 

Allele-Specific  Alu  PCR  (ASAP) 

We  used  a  modification  of  the  IRE-Bubble  PCR 
method  (Munroe  et  al.,  1994),  utilizing  the  same  amp¬ 
lification  (anchor)  primer,  but  altering  the  annealed 
anchor/linker  primers.  The  annealed  linkers  formed 
a  Y  instead  of  a  bubble  to  avoid  end-to-end  liga¬ 
tion.  Also,  instead  of  blunt-end  digestion,  genomic 
DNA  was  digested  with  Msel\  that  cuts  5'-T'TAA- 
y  and  does  not  cut  in  the  Alu  consensus.  Oth¬ 
erwise  the  genomic-anchor  ligations  were  prepared 
according  to  (Munroe  et  al.,  1994).  The  annealed 
linker  primers  are:  MSET:  5'-TAGAAGGAGAGG- 
ACGCTGTCTGTCGAAGG-3'  and  MSEB:  5'-GAG- 
CGAATTCGTCAACATAGCATTTCTGTCCTCTCC 
TTC-3^  The  amplification  (linker)  primer  is:  LNP: 
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5'GAATTCGTCAACATAGCATTTCT-3'.  We  placed 
an  EcoRI  site  at  the  5'  end  of  the  primer  for  the  option 
of  cloning  PCR  products  into  cloning  sites  of  common 
vectors.  No  bands  are  observed  on  a  gel  when  this 
primer  is  used  alone  with  the  anchored  template  at  an 
annealing  temperature  of  55°C. 

Unless  otherwise  noted,  PCR  conditions  (for 
all  ASAP  reactions)  were  performed  in  20|xl  us¬ 
ing  a  Perkin-Elmer  9600  thermal  cycler  with  the 
following  conditions:  1  x  Promega  buffer,  1 .5  mM 
MgCh,  200  |xM  dNTPs,  0.25  fxM  primers,  1.5  U 
Taq  polymerase  (Promega)  at  94°C  -  2  min,  94°C 
-  20  s,  62°C  -  20  s,  72°C  -  1  min,  10  s,  for 
5  cycles;  94°C  -  20  s,  55^C  -  20  s,  72°C  - 
1  min,  10  s,  for  25  cycles;  72°C  -  3  min.  Nested 
Alu  primers  were  used  that  move  along  the  Alu 
in  an  upstream  direction  as  follows:  ASH  (Ya5- 
specific):  5'-CTGGAGTGCAGTGGCGG-3';  HS18R 
(YaS-specific):  5'-CTCAGCCTCCCAAGTAGCTA- 
3';  HS16R  (YaS-specific):  5'-CGCCCGGCTATTTTT- 
GTAG-3'. 

The  ASH  primer  has  Ya5  diagnostic  nucleotides 
(present  in  both  Ya5  and  YaS  subfamilies).  In  the 
first  round  of  PCR,  stock  genomic  DNA  (2.4  ng 
anchored  DNA)  was  used  as  the  template.  For  sub¬ 
sequent  rounds  of  amplification,  PCR  products  were 
purified  through  microcon-30  (Amicon)  columns  us¬ 
ing  two  centrifuge  spins  following  the  addition  of 
400  p.1  of  water.  For  the  second  round  of  amplification, 
1  |xl  of  microcon-purified  first  round  PCR  reaction 
was  used  as  the  template,  and  for  the  third  round 
1  p.1  of  microcon-purified  second  round  PCR  products 
was  used.  For  display  analysis  (see  below)  the  PCR 
products  were  ‘equalized’  in  volume  following  micro- 
con  purification. 

Display  of  anchor- Alu  PCR  products 

Third  round  PCR  was  performed  utilizing  a  5'  end- 
labeled  primer  incorporating  [y-^^P]  ATP  (Amer- 
sham)  with  T4  polynucleotide  kinase  (New  England 
BioLabs).  PCR  conditions  were  as  above  with  the 
exception  of  using  0.188|jiM  of  each  Ya8  and  LNP 
cold  primers  and  0.075  [xM  of  end-labeled  Ya8  primer. 
Anchor-PCR  and  end-labeled  molecular  weight  mark¬ 
ers  ((|)X174  DNA  digested  with  Promega)  were 
separated  by  electrophoresis  on  denaturing  5%  long 
ranger  (AT  Biochem)  gels,  and  examined  by  autora¬ 
diography  following  exposure  to  Amersham  Hyper- 
film  at  room  temperature.  DNA  samples  from  different 
ethnic  groups  were  utilized  in  the  display  to  identify 


variants  that  resulted  from  recent  Alu  insertion  events 
(polymorphism). 

Verification  of  PCR  generated  DNA  fragments  as  YaS 
products 

Gels  were  aligned  to  autoradiographs  by  either  small 
cuts  in  various  parts  of  the  gel,  or  placement  of  low- 
level  radioactive  dye  on  the  gel  prior  to  re-exposure. 
Bands  were  then  sliced  out  of  the  gels,  placed  in 
200  jxl  of  water  and  eluted  by  heating  at  65°C  for 
15  min.  Samples  were  re-amplified  with  third  round 
PCR  primers,  cloned  and  sequenced  as  described 
above.  Following  verification  these  bands  were  amp¬ 
lified  by  the  third  round  primer  pair,  new  nested 
oligonucleotides  based  on  the  flanking  unique  se¬ 
quences  were  designed  to  move,  by  PCR,  downstream 
through  the  Alu  element  to  the  opposite  flank.  An¬ 
nealing  temperatures  were  adjusted  to  reflect  the  Tm 
of  the  oligonucleotide  primers.  Generally  two  or  three 
rounds  of  PCR  were  utilized  to  obtain  the  3'  flanking 
sequences  of  the  Alu.  These  PCR  products  were  also 
cloned  and  sequenced  in  the  same  manner. 

Results 

We  present  two  complementary  approaches  that  facil¬ 
itate  rapid  detection  of  newly  inserted  Alu  elements 
from  the  human  genome.  First,  computational  ana¬ 
lyses  of  human  genomic  DNA  sequences  from  the 
GenBank  database  are  used  in  the  identification  of  re¬ 
cently  integrated  Alu  elements.  Second,  allele-specific 
PCR  amplification  is  used  for  the  selective  enrich¬ 
ment  of  young  Alu  elements.  To  compare  and  contrast 
these  two  approaches,  we  present  the  data  obtained 
when  these  methods  are  applied  to  the  identification 
of  members  of  the  Ya8  Alu  subfamily,  the  smallest 
previously  reported  subfamily  of  Alu  repeats  in  the 
human  genome. 

Copy  number  and  sequence  diversity 

In  order  to  estimate  the  copy  number  of  Ya8  Alu 
family  members,  we  determined  the  number  of  ex¬ 
act  matches  to  our  subfamily  specific  oligonucleotide 
query  sequence  as  a  proportion  of  the  human  gen¬ 
ome  that  had  been  sequenced  in  the  non-redundant 
database.  We  obtained  27  matches  to  the  subfam¬ 
ily  specific  query  sequence  from  the  non-redundant 
database.  Upon  further  sequence  annotation  using  the 
RepeatMasker2  web  site,  five  matched  the  Ya8  Alus 
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previously  sequenced  in  our  laboratories  (Batzer  et  al., 
1990;  Batzer  &  Deininger,  1991;  Batzer  et  al.,  1995). 
Eight  of  the  elements  identified  in  the  search  were 
classified  as  Alu  Sx  subfamily  members,  and  two 
matched  the  TPA  25  Ya8  Alu  family  member.  A  total 
of  1 3  independent  Ya8  Alu  elements  were  identified 
from  the  search  of  the  non-redundant  database  that 
were  not  sequenced  as  part  of  a  project  to  specific¬ 
ally  identify  recently  integrated  Alu  elements.  The 
non-redundant  database  contained  45.3%  human  DNA 
sequences  for  a  total  of  590,140,703  bases  of  human 
sequence  on  the  date  of  the  search.  The  estimated 
size  of  the  Ya8  subfamily  is  (3  x  10^  bp/590,  140, 
703  bp)  X 13  unique  Ya8  matches  =  66  Ya8  subfamily 
members.  This  estimate  compares  favorably  with  that 
of  50  previously  reported  based  upon  library  screen¬ 
ing,  restriction  digestion  or  Southern  blotting  (Batzer 
et  al.,  1995).  An  additional  six  matches  to  the  Ya8  sub¬ 
family  query  sequence  were  identified  in  the  HTGS. 
One  of  these  elements  was  an  Alu  Sq  subfamily  mem¬ 
ber,  while  a  second  element  was  a  duplicate  copy  of 
Ya8NBC60.  PCR  analyses  of  two  elements  identi¬ 
fied  in  the  high  throughput  database,  Ya8NBC7  and 
Ya8NBC16  (GenBank  accession  numbers  AL  109937 
and  AC008944),  were  inconclusive  and  these  elements 
were  eliminated  from  further  analysis.  These  two  ele¬ 
ments  were  identified  from  low  pass  first  sequence 
runs  in  the  HTGS  database.  It  is  not  surprising  that 
the  PCR  analyses  failed,  since  the  DNA  sequences 
are  of  presumably  lower  quality  than  finished  DNA 
sequences  contained  in  the  non-redundant  database. 
However,  two  additional  Ya8  Alu  repeats  (Ya8NBC8 
and  Ya8NBC15)  were  identified  in  the  HTGS  database 
and  subjected  to  further  analysis. 

A  comparison  of  the  nucleotide  sequences  of  all  of 
the  Ya8  Alu  family  members  is  shown  in  Figure  2.  In 
order  to  determine  the  time  of  origin  for  the  Ya8  sub¬ 
family  we  divided  the  nucleotide  substitutions  within 
the  elements  into  those  that  have  occurred  in  CpG  di¬ 
nucleotides  and  those  that  have  occurred  in  non-CpG 
positions.  The  distinction  between  types  of  mutations 
is  made  because  the  CpG  dinucleotides  mutate  at  a  rate 
that  is  about  10  times  faster  than  non-CpG  positions 
(Labuda  &  Striker,  1989;  Batzer  et  al.,  1990)  as  a 
result  of  the  deamination  of  5-methylcytosine  (Bird, 
1980).  A  total  of  14  non-CpG  mutations  and  8  CpG 
mutations  occurred  within  the  14  Alu  Ya8  subfamily 
members  reported.  Using  a  neutral  rate  of  evolution 
for  primate  intervening  DNA  sequences  of  0.15% 
per  million  years  (Miyamoto,  Slightom  &  Goodman, 
1987)  and  the  non-CpG  mutation  rate  of  0.413% 
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Figure  2.  Multiple  alignment  of  Ya8  subfamily  members.  The 
Ya8  subfamily  consensus  (con)  is  derived  from  the  most  common 
nucleotide  found  at  each  position  within  the  subfamily  members. 
Nucleotide  substitutions  at  each  position  are  indicated  with  the 
appropriate  nucleotide.  Deletions  are  marked  by 


(14/3388  using  only  non-CpG  bases)  within  the  14 
Ya8  Alu  elements  yields  an  estimated  age  of  2.75  mil¬ 
lion  years  old  for  the  Ya8  subfamily  members.  This 
estimate  of  age  is  somewhat  higher  than  the  660,000 
years  previously  reported  (Batzer  et  al.,  1995).  How¬ 
ever,  the  previous  study  of  Ya8  Alu  family  members 
involved  only  four  elements  making  the  calculated  age 
more  subject  to  random  statistical  fluctuation.  This  es¬ 
timate  is  also  consistent  with  the  expansion  of  a  family 
of  mobile  elements  that  began  around  the  time  humans 
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Figure  3.  Nucleotide  sequences  flanking  Ya8  subfamily  members. 
Nucleotide  sequences  flanking  the  Ya8  Alu  family  members  are 
shown.  Nucleotides  encompassed  in  the  direct  repeats  are  under¬ 
lined.  The  length  of  the  oligo-dA  rich  tail  is  denoted  by  an  (A)  and 
a  subscript  indicating  the  number  of  adenine  residues. 


and  African  apes  diverged,  which  is  thought  to  have 
occurred  4-6  million  years  ago  (Miyamoto,  Slightom 
&  Goodman,  1987). 

Inspection  of  the  nucleotide  sequences  flanking 
each  Ya8  Alu  family  member  shows  that  all  of  the 
elements  were  flanked  by  short  perfect  direct  repeats 
(Figure  3).  The  direct  repeats  ranged  in  size  from  3- 
17  nucleotides.  These  direct  repeats  are  fairly  typical 
of  recently  integrated  Alu  family  members.  Two  of 
the  Alu  Ya8  Alu  family  members  contained  5^  trun¬ 
cations  (Ya8NBC2  and  Ya8NBCl  1).  Since  Ya8NBC2 
and  Ya8NBCll  are  both  flanked  by  perfect  direct 
repeats  the  truncations  in  these  elements  probably  oc¬ 
curred  as  a  result  of  incomplete  reverse  transcription 
or  improper  integration  into  the  genome  rather  than  by 
post-integration  instability.  All  of  the  Ya8  Alu  family 
members  had  oligo-dA  rich  tails  that  ranged  in  length 
from  a  minimum  of  four  nucleotides  to  over  40  bases 
in  length.  It  is  also  interesting  to  note  that  the  3'  oligo- 
dA  rich  tails  of  several  of  the  elements  (Ya8NBC2, 
Ya8NBC3,  Ya8NBC4,  and  Ya8NBC8)  have  accumu¬ 
lated  random  mutations  beginning  the  process  of  the 
formation  of  simple  sequence  repeats  of  varied  se¬ 
quence  complexity.  The  oligo-dA  rich  tails  and  middle 
A  rich  regions  of  Alu  elements  have  previously  been 
shown  to  serve  as  nuclei  for  the  genesis  of  simple 
sequence  repeats  (Arcot  et  al.,  1995b). 

Phylogenetic  distribution,  and  chromosomal  location 

The  phylogenetic  distribution  of  each  Ya8  Alu  element 
was  determined  by  amplifying  genomic  DNA  from 
two  non-human  primates  (common  chimpanzee  and 
gorilla).  All  of  the  Ya8  Alu  family  members  were  ab¬ 
sent  from  the  genomes  of  non-human  primates.  This 
suggests  that  the  majority  of  these  elements  dispersed 
within  the  human  genome  sometime  after  the  human 
and  African  ape  divergence.  The  chromosomal  loca¬ 


tion  of  each  Ya8  Alu  element  was  taken  directly  from 
the  GenBank  database  entry  or  determined  by  PCR 
amplification  of  human/rodent  monochromosomal  hy¬ 
brid  cell  line  DNA  samples  (Table  1). 

Human  genomic  diversity 

In  order  to  determine  the  human  genomic  variation 
associated  with  each  of  the  Ya8  Alu  family  members 
we  subjected  a  panel  of  human  DNA  samples  to  PCR 
amplification  (Table  2).  The  panel  was  composed  of 
20  individuals  of  European  origin,  African  Americ¬ 
ans,  Greenland  Natives  and  Egyptians  for  a  total  of  80 
individuals  (160  chromosomes).  Using  this  approach 
four  of  the  14  (Ya8NBC8,  Ya8NBC10,  Ya8NBC14 
and  Ya8NBC15)  Alu  Ya8  subfamily  members  were 
monomorphic  for  the  presence  of  the  Alu  element 
suggesting  that  these  elements  integrated  in  the  gen¬ 
ome  prior  to  the  radiation  of  modern  humans  from 
Africa.  Three  of  the  elements  (Ya8NBC2,  Ya8NBC13 
and  Ya8NBC17)  appeared  heterozygous  in  all  of  the 
individuals  that  were  analyzed,  suggesting  that  they 
had  integrated  into  previously  undefined  repetitive 
elements  within  the  human  genome  as  previously  de¬ 
scribed  (Batzer  et  al.,  1991).  However,  the  remaining 
seven  elements  were  polymorphic  for  the  presence  of 
an  Alu  repeat  within  the  genomes  of  the  test  panel  in¬ 
dividuals  (Table  2).  The  unbiased  heterozygosity  val¬ 
ues  (corrected  for  small  sample  sizes)  for  these  poly¬ 
morphic  Alu  insertions  were  variable,  and  approached 
the  theoretical  maximum  in  several  cases.  This  is  quite 
interesting  since  the  maximum  uncorrected  heterozy¬ 
gosity  for  these  biallelic  elements  is  50%  and  suggests 
that  these  Alu  insertion  polymorphisms  will  make  ex¬ 
cellent  markers  for  the  study  of  human  population 
genetics.  In  addition,  50%  of  the  randomly  identified 
Ya8  Alu  family  members  are  polymorphic.  These  res¬ 
ults  suggest  that  the  Ya8  subfamily  is  younger  than 
either  the  Ya5  (from  which  Ya8  was  derived)  or  Yb8 
Alu  subfamilies,  since  only  25%  of  the  members  of 
these  Alu  subfamilies  are  polymorphic  in  the  human 
genome  (Batzer  et  al.,  1995). 

Allele-Specific  Alu  PCR  (ASAP) 

Although  database  screening  is  extremely  efficient  for 
identifying  recent  Alu  elements,  it  will  not  allow  iden¬ 
tification  of  new  elements  from  genomes  not  included 
in  the  sequencing  efforts.  Our  primary  objective  with 
the  ASAP  technique  is  to  rapidly  identify  newly  in¬ 
serted  Alu  elements  from  a  background  of  500,000 
older  Alus.  To  accomplish  this  feat,  we  utilized  a 
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Nested  Allele-Specific  Alu  PCR 

Figure  4.  The  Allele-Specific  Alu  PCR  (ASAP)  anchor  strategy. 
Schematic  diagram  of  the  technique  for  the  isolation  of  a  designated 
subset  of  Alu  repeats  based  on  a  modification  of  the  IRE-bubble 
PCR  technique  (Munroe  et  al.,  1994).  The  shaded  rectangle  repres¬ 
ents  an  Alu  sequence  in  genomic  DNA.  The  Msel  (or  an  alternative 
restriction  enzyme)  cleaves  in  unique  sequences  flanking  the  Alu 
repeat  (small  arrows).  The  anchors  with  the  complementary  Msel 
site  are  ligated.  The  anchors  are  designed  so  that  the  two  oligonuc¬ 
leotide  strands  base-pair  only  at  the  Msel  site  end,  but  not  at  the 
other  end  (represented  here  schematically  with  four  arbitrary  bases). 
PCR  is  initiated  using  an  allele-specific  Alu  primer  {Z').  The  anchor 
primer  will  not  be  able  to  base  pair  preventing  anchor-to-anchor 
amplification.  Only  those  fragments  (a)  generated  by  the  Alu  primer 
are  available  for  amplification  by  the  anchor  primer.  The  amplified 
product  (a  and  a')  provides  a  template  for  nested  PCR  (primer  y')  to 
further  decrease  the  background. 


modification  of  the  IRE-bubble  PCR  technique  (Mun¬ 
roe  et  al.,  1994).  The  procedure  utilizes  an  anchored 
PCR  strategy  (Figure  4)  in  which  genomic  DNA  is 
cleaved  with  an  enzyme  that  does  not  cleave  within 
the  Alu  repeat.  The  modified  anchor  is  then  ligated  to 
the  fragment  ends.  This  anchor  will  only  allow  PCR 
amplification  if  a  primer  first  primes  within  the  frag¬ 
ment  and  replicates  across  the  linker  eliminating  any 
problems  with  amplification  from  anchor  to  anchor. 
We  take  advantage  of  the  base  changes  that  identify  the 
younger  Alu  subfamily  members  (Batzer  et  al.,  1996b; 
Batzer  &  Deininger,  1991).  In  addition,  this  allows 


the  selective  enrichment  for  a  smaller  fraction  of  the 
Alu  elements  from  the  genome,  as  there  are  only  1000 
Ya5  and  1000  Yb8  Alu  repeats  and  approximately 
50  Ya8  Alu  family  members  in  the  human  genome 
(Batzer  et  al.,  1995).  We  gain  the  specificity  for  the 
recent  inserts  by  using  a  PCR  primer  that  matches  the 
particular  Alu  subfamily  with  the  diagnostic  positions 
at  its  3'  end.  Each  amplification  will  extend  from  a 
specific  Alu  subfamily  member  through  its  upstream 
flanking  sequences  to  the  randomly  located  flanking 
restriction  site.  The  numerous  older  Alu  repeats  have 
accumulated  many  mutations  and  may  compete  for 
the  PCR  primers  with  the  Ya5/8  elements.  Therefore, 
although  the  first  amplification  provides  a  great  deal 
of  subfamily  specificity,  we  then  carry  out  a  ‘nested’ 
reaction  using  a  second  allele-specific  primer  to  im¬ 
prove  the  specificity,  followed  by  a  third  round  with 
another  allele-specific  primer.  In  theory,  we  can  utilize 
primers  for  each  of  the  5-8  diagnostic  mutations  in  a 
subfamily. 

In  the  example  presented  in  this  paper,  we  fo¬ 
cused  our  attention  on  the  identification  and  display 
of  the  lower  copy  number  Alu  Ya8  subfamily.  Also, 
to  better  display  the  results,  we  used  nested  primers  in 
the  upstream  direction  of  Ya8  to  avoid  amplification 
problems  through  the  A-rich  tail.  Using  the  primers 
described  in  the  Materials  and  methods  section,  by 
the  third  round  of  PCR,  we  were  able  to  visualize 
discrete  DNA  fragments  on  an  agarose  gel  (data  not 
shown).  The  size  range  of  these  fragments  appeared 
to  be  between  150 bp  and  800  bp.  To  enhance  this 
display,  we  chose  an  alternative  method  of  electro¬ 
phoretic  separation  and  end-labeled  the  nested  primer 
to  further  minimize  background  (see  below).  To  verify 
these  were  Ya8  repeats,  we  directly  cloned  the  third 
round  PCR  products  and  sequenced  them.  Partial  or 
complete  sequences  of  these  products,  using  vector 
primers  in  both  directions,  demonstrated  all  12  clones 
to  be  amplified  by  the  Alu-anchor  primer  pair,  al¬ 
though  in  one  case  the  unique  linker  sequence  was 
imprecise.  All  these  elements  contained  the  Ya5/8  dia¬ 
gnostic  nucleotides  (There  were  no  further  upstream 
diagnostics  to  declare  these  as  Ya8  elements.). 

For  eight  of  the  12  isolated  clones,  there  were 
between  12  and  18  unique  nucleotides  between  the 
linker  and  the  Alu  (or  truncated  Alu)  sequences.  Since 
Alu  elements  preferentially  insert  into  A-T  rich  re¬ 
gions  (Daniels  &  Deininger,  1985)  and  Msel  cuts  at 
the  sequence  TTAA,  then  this  result  is  not  surpris¬ 
ing.  The  advantage  of  using  Msel  for  the  restriction 
digestion  is  that  most  of  the  Alu-linker  products  are 
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small  enough  to  be  amplified.  Although  it  would  be 
difficult  to  perform  nested  PCR  in  the  opposite  direc¬ 
tion  with  those  few  A-T  rich  nucleotides,  searching 
GenBank  using  the  BLAST  program  with  the  obtained 
flanking  unique  DNA  sequences  as  the  query  may 
in  some  cases  identify  the  rest  of  the  genomic  se¬ 
quence  for  each  Alu  element.  This  will  provide  the 
Alu  location  with  both  its  flanking  sequences.  Flank¬ 
ing  unique  sequence  primers  can  then  be  designed  and 
the  Alu  polymorphism  can  then  be  confirmed  using 
other  human  DNA  sources.  Once  the  polymorphism 
is  confirmed  subsequent  population  studies  can  be 
performed. 

Display  ami  rapid  identification  of  YaS  associated 
variants 

To  alleviate  the  need  for  testing  every  YaS  element 
obtained  by  this  assay,  we  chose  to  end-label  the 
third  round  nested  PCR  primer  to  enable  a  display 
of  individual  YaS  repeats  following  electrophoretic 
separation  and  autoradiography.  Observed  variations 
may  be  due  to  primer  mismatch,  genomic  rearrange¬ 
ments,  small  insertion/deletions  or  Alu  based  inser¬ 
tion/deletions  (I/D). 

We  carried  out  the  procedure  with  four  different 
individuals  to  discern  which  bands  represent  vari¬ 
ants  (Figure  5),  and  to  effectively  display  variants  as 
DNA  fingerprints.  We  obtained  about  40  bands  per 
individual  from  a  single  reaction.  Among  the  four 
individuals  analyzed,  about  one  half  of  the  bands  ap¬ 
peared  variant  (Figure  5).  We  have  developed  a  potent 
method  for  the  generation  of  YaS  associated  DNA 
fingerprints  that  is  in  reasonable  agreement  with  the 
database  mining  approach  and  seems  to  display  the 
majority  of  Alu  subfamily  members.  This  necessitated 
addressing  what  proportion  of  the  fragments  generated 
were  the  result  of  the  presence  of  a  YaS  Alu  element 
and  whether  the  lack  of  the  same  band  in  another  in¬ 
dividual  represented  an  Alu  insertion  polymorphism. 
We  chose  12  bands  to  re-amplify  and  verify  as  Ya5/8 
elements.  Those  bands  that  appeared  variant  were  ana¬ 
lyzed  for  Alu  insertion  polymorphisms.  Other  bands 
were  selected  for  future  testing  of  dimorphisms  as 
these  individual  YaS  elements  may  display  variation 
among  other  people/populations.  Occasionally,  upon 
re-amplification  from  the  isolated  band,  we  obtained 
background  products  and  therefore,  generally  more 
than  one  clone  was  sequenced.  Of  the  12  isolated 
bands  (Figure  5)  nine  were  verified  as  precisely  ampli¬ 
fied  HS 1 6R-LNP  products.  Two  others  each  contained 
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Figure  5.  DNA  fingerprints  of  unrelated  individuals  based  on 
anchored-Alu  PCR.  Individual  bands  are  numbered  for  identifica¬ 
tion  puiposes.  Fragment  lengths  are  shown  in  nucleotides  to  the 
left.  DNA  samples  used  are  of  Caucasian  (lane  a),  Hispanic  (lane 
b),  Hindu-Indian  (lane  c)  and  Chinese  (lane  d)  descent. 


a  Ya5/8  Alu,  one  randomly  amplified  by  HS16R  (anc- 
8)  in  lieu  of  the  linker  primer,  while  anc-3  contained 
sequences  downstream  of  HS16R.  And  4  apparently 
was  an  amplified  J  (PS)  Alu  element  (data  not  shown). 
Therefore,  this  demonstrates  the  majority  of  the  bands 
visualized  on  the  autoradiograph  are  AluYa5/8  repeats 
and  most  probably  YaS.  The  numerous  bands  at  about 
178  nt  coincide  with  our  previous  finding  that  many 
of  the  products  will  have  between  12  and  18  unique 
sequences.  Of  the  nine  bands  where  we  attempted  to 
obtain  the  opposite  flank  by  nested  anchored  PCR,  we 
reached  the  opposite  (downstream)  flank  of  the  Alu  for 
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three  of  them  (anc-5,  anc-6,  anc-4).  In  some  cases  the 
amount  of  unique  sequence  was  too  small  to  employ 
nested  primers,  and  in  some  cases  there  was  a  high 
level  of  A-T  richness.  In  one  case  we  merely  got  a  non¬ 
specific  product.  All  three  sequences  obtained  were 
authentic  Ya8  Alu  elements  based  on  the  diagnostic 
nucleotide  positions  and  the  high  level  of  conserva¬ 
tion  of  the  sequence  in  relation  to  the  consensus.  This 
demonstrates  the  successful  nature  of  our  protocol  to 
select  for  this  subfamily  of  repeats  amongst  a  large 
background  of  Alu  repeats. 

When  ‘crossing’  the  anc-5  Alu  by  nested  PCR  us¬ 
ing  four  individuals  (not  all  identical  to  Figure  5),  we 
found  a  correspondence  between  the  generation  of  a 
distinct  band  among  the  individuals  that  also  had  the 
anc-5  band  on  an  autoradiograph.  However,  we  ob¬ 
tained  a  short  3'  flank  of  12  nucleotides  that  proved 
difficult  in  amplifying  DNA  from  various  individuals 
with  unique  flanks.  It  is  still  possible  that  this  variant 
represents  an  I/D  event.  Besides  anc-5,  anc-6  also  ap¬ 
peared  polymorphic  on  the  autoradiograph,  although 
anc-4  did  not.  However,  since  we  had  both  flanks,  for 
these  Alu  elements,  we  developed  primers  to  rapidly 
assess  various  individuals  for  an  insertion  variant.  For 
anc-6,  one  of  a  few  different  primer  sets  worked  well, 
yielding  the  band  of  expected  size,  although  also  gen¬ 
erating  a  few  non-specific  bands.  However,  a  band  was 
present  for  1 1  unrelated  individuals  analyzed  (data  not 
shown),  including  those  observed  on  the  autoradio¬ 
graph,  suggesting  that  the  anc6  polymorphism  was  not 
the  result  of  an  I/D  variant.  In  addition,  this  band  was 
absent  in  the  chimpanzee,  possibly  indicating  the  ab¬ 
sence  of  the  Alu  or  perhaps  primer  mismatch  due  to 
nucleotide  divergence.  Although  anc-4  was  not  vari¬ 
ant  on  the  autoradiograph,  we  tested  13  individuals  of 
various  ethnic  backgrounds  for  an  I/D  event  and  ob¬ 
served  it  to  be  monomorphic.  Although  we  have  not 
verified  any  of  the  displayed  variants  to  be  the  result 
of  an  Alu  insertion,  this  potential  remains,  as  we  ob¬ 
served  Ya8  elements  to  be  highly  polymorphic,  and  all 
the  bands,  but  one,  analyzed  were  Ya8  repeats. 


Discussion 

In  this  manuscript  we  present  an  analysis  of  the  smal¬ 
lest  defined  subfamily  of  Alu  elements  located  within 
the  human  genome  termed  Ya8.  This  subfamily  of  Alu 
elements  was  derived  from  the  Ya5  subfamily  of  Alu 
elements.  The  Ya5  subfamily  is  composed  of  approx¬ 
imately  1000  members  and  has  largely  integrated  into 


the  human  genome  sometime  after  the  human- African 
ape  divergence.  The  main  reasons  that  supported  the 
more  recent  origin  of  the  Ya8  subfamily  are  the  accu¬ 
mulation  of  three  additional  diagnostic  mutations  as 
compared  to  the  Ya5  subfamily  and  the  lower  copy 
number  for  the  Ya8  subfamily.  It  is  also  important  to 
note  that  a  higher  percentage  of  the  Ya8  Alu  family 
members  (50%)  are  polymorphic  for  insertion  pres¬ 
ence/absence  as  compared  to  only  25%  polymorphism 
in  the  Yb8  and  Ya5  Alu  subfamilies.  These  data  also 
suggest  a  recent  origin  for  the  Alu  Ya8  subfamily 
within  the  human  genome.  However,  it  is  still  possible 
that  the  Ya8  Alu  subfamily  may  have  amplified  from 
an  allelic  variant  of  the  Ya5  subfamily  that  was  not  as 
efficient  at  mobilization  as  the  Ya5  source  gene. 

The  ability  to  detect  a  handful  of  Alu  repeats 
from  the  background  of  several  hundred  thousand  Alu 
elements  in  the  human  genome  is  impressive.  The  ap¬ 
plication  of  computational  biology  to  the  analysis  of 
large  multigene  families  such  as  Alu  repeats  offers 
the  potential  to  address  a  number  of  new  questions 
in  comparative  genomics  as  an  increasing  proportion 
of  the  human  genome  is  sequenced.  Studies  of  the 
present,  as  well  as  ancient,  integration  patterns  of  mo¬ 
bile  elements  in  the  human  genome  may  begin  to  be 
addressed.  In  addition,  the  patterns  of  diversity  gen¬ 
erated  by  the  integration  of  mobile  elements  into  the 
human  genome  may  be  analyzed  at  a  scale  that  was 
previously  unimaginable.  These  types  of  studies  will 
shed  new  insight  into  the  relationships  between  differ¬ 
ent  types  of  mobile  elements  in  the  human  genome, 
integration  site  preferences,  impact,  and  the  biological 
properties  of  these  elements. 

The  development  of  the  ASAP  technique  facilit¬ 
ated  the  display  of  a  subset  of  Ya8  Alu  elements  from 
a  large  and  complex  background.  The  preferential  isol¬ 
ation  of  the  young  Alu  elements,  as  demonstrated 
here,  enhances  the  identification  of  recent  Alu  inser¬ 
tion  events  in  the  genome.  We  focused  our  efforts  on 
the  smallest  known  defined  subfamily  of  Alu  repeats 
to  best  address  issues  of  sensitivity  of  the  display  of 
individual  elements.  One  of  the  advantages  of  this 
technique  is  its  flexibility.  Altering  the  restriction  en¬ 
zyme  used  for  digestion  of  genomic  DNA  selects  for 
distinct  subsets  of  Alu  elements  within  a  particular 
subfamily,  since  this  technique  preferentially  amplifies 
products  that  range  from  200  and  800  bp  in  size.  In 
addition,  modifications  to  the  ASAP  technique,  such 
as  the  use  of  a  less  frequent  restriction  endonuclease, 
may  allow  for  a  display  of  subsets  of  the  larger  groups 
of  Alu  repeats  such  as  Ya5  elements.  Alternatively,  the 
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use  of  primers  that  select  for  subfamily  ‘subgroups’ 
may  also  be  used  to  reduce  the  complexity  of  the 
resultant  display  by  decreasing  the  number  of  PCR 
products.  Although  we  focused  on  Ya8  Alu  elements 
due  to  their  low  copy  number,  the  young  Yb8  Alu 
subfamily  is  another  alternative  for  ASAP  with  an  es¬ 
timated  copy  number  of  only  1000  elements  (Batzer 
et  ah,  1995;  Zietkiewicz  et  al.,  1994)  and  some  poly¬ 
morphic  members  (Hutchinson  et  al,  1993;  Hammer 
1994;  Arcot  et  al.,  1998).  We  have  previously  demon¬ 
strated  the  isolation  of  young  Alu  elements  (based  on 
sequence  identity  to  a  consensus)  using  a  Yb8  dia¬ 
gnostic  primer,  and  a  generic  Alu  as  an  anchor  in  the 
amplification  reaction,  that  can  be  profiled  with  min¬ 
imal  background  (Kass,  Batzer  &  Deininger,  1996). 
It  is  conceivable  that  variations  on  the  anchored-Alu 
PCR  technique  can  be  employed  to  rapidly  localize  in¬ 
dividual  elements  from  all  three  subfamilies  of  young 
Alu  elements. 

Once  the  flanking  sequences  of  the  young  Alu 
elements  are  obtained,  the  PCR  strategy  can  be  em¬ 
ployed  to  trace  polymorphisms  that  have  resulted  from 
recent  Alu  insertions  and  are  not  yet  fixed  in  hu¬ 
man  populations.  The  anchored-Alu  PCR  approach 
not  only  facilitates  rapid  identification  of  young  ele¬ 
ments  by  displaying  the  amplification  products,  but 
will  also  increase  the  potential  for  selecting  only  those 
mobile  element  fossils  that  exhibit  presence/absence 
variation.  Selection  in  this  manner  also  shifts  the  spec¬ 
trum  for  new  elements  toward  the  elements  that  are 
lower  frequency  and  less  likely  to  be  held  in  com¬ 
mon  between  individuals  or  populations.  Therefore, 
this  approach  should  prove  to  be  quite  useful  for  the 
ascertainment  of  mobile  element  fossils  to  address 
questions  about  more  recent  human  diversifications.  In 
contrast,  the  identification  of  mobile  element  fossils 
using  computational  biology  affords  the  opportunity 
to  identify  multiple  frequency  classes  of  Alu  elements 
that  are  shared  at  different  geographic  levels  within  the 
human  population. 

The  ASAP  method’s  strength  comes  from  its  abil¬ 
ity  to  isolate  a  subset  of  interspersed  repeat  sequences 
from  different  DNA  sources  and  compare  them  at  the 
same  time.  In  other  words,  this  approach  is  not  limited 
to  Alu  elements,  but  may  be  used  with  other  SINEs 
(from  other  organisms)  or  even  long  interspersed  ele¬ 
ments  (LINEs)  or  for  that  matter  any  repeated  DNA 
sequence  family  that  has  a  defined  subfamily  struc¬ 
ture.  A  second  potential  application  would  be  the  use 
of  ASAP  to  monitor  genomic  instability  associated 
with  different  forms  of  cancer  by  providing  a  multi¬ 


locus  monitoring  system.  Due  to  its  high  flexibility  the 
ASAP  technique  has  an  enormous  range  of  potential 
applications. 

Mobile  element  fossils  have  proven  to  be  simple 
powerful  tools  for  tracing  the  origin  of  human  popula¬ 
tions  (Perna  et  al.,  1992;  Batzer  et  al.,  1994a,b,  1996a; 
Stoneking  et  al.,  1997).  These  elements  should  also 
prove  quite  useful  to  the  forensic  community  as  pa¬ 
ternity  identity  testing  reagents  (Batzer  &  Deininger, 
1991;  Novick  et  al.,  1993).  Some  Alu  insertion  poly¬ 
morphisms  have  been  identified  by  chance  (Deininger 
&  Batzer,  1995)  while  others  have  been  identified  by 
library  screening  in  a  directed  approach  (Batzer  & 
Deininger,  1991;  Batzer  et  al.,  1995;  Arcot  et  al., 
1995a,  b,  c;  Batzer  et  al.,  1996a;  Arcot  et  al.,  1998). 
Here,  we  have  presented  two  complementary  meth¬ 
ods  involving  computational  biology  and  PCR  based 
displays  that  will  enhance  our  ability  to  identify  the 
genomic  fossils  of  recently  integrated  mobile  elements 
from  complex  genomes.  These  approaches  will  con¬ 
tribute  to  a  new  era  in  biological  sciences  that  will 
increasingly  rely  upon  informatics/computational  bio¬ 
logy  as  well  as  hard-core  bench  molecular  biology  to 
answer  global  questions  in  comparative  genomics. 
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We  have  utilized  computational  biology  to  screen  GenBank  for 
the  presence  of  recently  integrated  Ya5  and  Yb8  Alu  family 
members.  Our  analysis  identified  2640  Ya5  Alu  family  members 
and  1852  Yb8  Alu  family  members  from  the  draft  sequence  of 
the  human  genome.  We  selected  a  set  of  475  of  these  elements 
for  detailed  analyses.  Analysis  of  the  DNA  sequences  from  the 
individual  Alu  elements  revealed  a  low  level  of  random 
mutations  within  both  subfamilies  consistent  with  the  recent 
origin  of  these  elements  within  the  human  genome.  Polymerase 
chain  reaction  assays  were  used  to  determine  the  phylogenetic 
distribution  and  human  genomic  variation  associated  with  each 
Alu  repeat.  Over  99  %  of  the  Ya5  and  Yb8  Alu  family  members 
were  restricted  to  the  human  genome  and  abse’^t  from  ortholo- 
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mates,  confirming  the  recent  origin  of  these  Alu  subfamilies  in 
the  human  genome.  Approximately  1%  of  the  analyzed  Ya5 
and  Yb8  Alu  family  members  had  integrated  into  previously 
undefined  repeated  regions  of  the  human  genome.  Analysis  of 
mosaic  Yb8  elements  suggests  gene  conversion  played  an 
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Ya5  and  Yb8  Alu  family  members  were  polymorphic  for  inser¬ 
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human  populations.  The  newly  identified  Alu  insertion  poly¬ 
morphisms  will  be  useful  tools  for  the  study  of  human  genomic 
diversity. 
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Introduction 

Alu  elements  are  the  most  abundant  Short 
INterspersed  Elements  (SINEs),  reaching  a  copy 
number  of  over  one  million  in  the  human  gen¬ 
ome,^  making  them  the  mobile  element  with  the 
highest  copy  number.  Alu  repeats  compose 
greater  than  10%  of  the  mass  of  the  human  gen¬ 
ome.  Full-length  Alu  elements  are  approximately 
300  bp  in  length  and  commonly  found  in 
introns,  3'  untranslated  regions  of  genes,  and 
intergenic  genomic  regions.  Amplification  of 
Alu  elements  occurs  through  the  reverse  tran¬ 
scription  of  RNA  in  a  process  termed  retroposi- 
tion.^  However,  Alu  elements  have  no  open 
reading  frames,  so  they  are  thought  to  parasitize 
the  required  factors  for  their  amplification  from 
Long  Interspersed  Elements  (LINEs).^"®  Although 
the  human  genome  contains  over  one  million 
Alu  elements,  only  a  few  Alu  elements,  termed 
'"master''  or  source  genes,  are  retroposition  com¬ 
petent,^  The  crucial  factor(s)  that  determine  an 
Alu  as  a  functional  source  gene  are  not  fully 
known.  Several  factors  have  been  suggested  to 
influence  the  amplification  process,  including 
transcriptional  capacity,  priming  or  self-priming 
for  reverse  transcription  and  others.^"^ 

Alu  elements  first  appeared  in  the  primate  gen¬ 
omes  over  65  million  years  (myr)  ago."  Since  then, 
the  amplification  of  Alu  elements  within  the 
human  genome  has  been  punctuated,  with  the  cur¬ 
rent  rate  being  at  least  100-fold  slower  than  the 
initial  rate  of  Alu  expansion  within  primate  gen- 
omes.^^  Throughout  Alu  evolution,  the  source 
gene(s)  accumulated  mutations  that  were  incorpor¬ 
ated  into  the  new  copies  made,  creating  new  Alu 
subfamilies.  Therefore,  the  Alu  family  is  composed 
of  a  number  of  distinct  subfamilies  characterized 
by  a  hierarchical  series  of  mutations  that  result  in  a 
series  of  subfamilies  of  different  ages.^^"^°  Of  these 
subfamilies,  almost  all  of  the  recently  integrated 
Alu  elements  within  the  human  genome  belong  to 
one  of  several  closely  related  "young"  Alu  sub¬ 
families:  Y,  Ycl,  Yc2,  Ya5,  Ya5a2,  Ya8,  Yb8, 
and  Yb9  with  the  majority  being  Ya5  and  Yb8 
subfamily  members.^'^®'"'^^ 

The  availability  of  a  draft  human  genomic 
DNA  sequence  as  a  result  of  the  Human  Gen¬ 
ome  Project^^  facilitates  the  "m  silico"  identifi¬ 
cation  of  recently  integrated  Alu  elements  from 
the  human  genome. This  method  proves  to 
be  less  demanding  in  comparison  to  older 
approaches,  such  as  cloning  and  library  screen- 
These  recently  integrated  Alu  elements 
serve  as  temporal  landmarks  in  the  evolution  of 
our  genome,  and  many  of  them  will  prove  to  be 
useful  in  the  study  of  human  evolution  and  in 
the  study  of  the  natural  history  of  different 
regions  of  the  genome.  Here,  we  present  an 
analysis  of  the  human  genomic  diversity  associ¬ 
ated  with  475  members  of  the  Alu  Ya5  and  Yb8 
subfamilies  in  the  human  genome. 


Results 

Subfamily  copy  number  and  sequence  diversity 

In  order  to  determine  the  copy  number  of  each 
subfamily  of  Alu  elements,  we  searched  the  draft 
sequence  of  the  entire  human  genome  for  the  pre¬ 
sence  of  Alu  repeats  using  oligonucleotide 
sequences  complementary  to  each  of  the  subfami¬ 
lies  (outlined  in  the  Materials  and  Methods).  Our 
query  of  the  draft  human  genome  sequence  ident¬ 
ified  2640  Alu  Ya5  subfamily  members  and  1852 
Alu  Yb8  subfamily  members.  Both  of  these  copy 
numbers  are  in  good  agreement  with  previous  esti¬ 
mates  of  the  sizes  of  these  Alu  subfamilies  based 
upon  high-resolution  restriction  mapping  and  com¬ 
putational  biology 

A  comparison  of  the  nucleotide  sequences  of  all 
of  the  Ya5  and  Yb8  Alu  family  members  can  be 
found  at  our  website  (http://129.81.225.52).  In 
order  to  determine  the  time  of  origin  for  the 
respective  Ya5  and  Yb8  subfamilies,  we  divided 
the  nucleotide  substitutions  within  the  elements  in 
each  family  into  those  that  occurred  in  CpG  dinu¬ 
cleotides  and  those  that  occurred  in  non-CpG  pos¬ 
itions.  The  distinction  between  types  of  mutations 
is  made  because  the  CpG  dinucleotides  mutate  at  a 
rate  that  is  about  ten  times  faster  than  non-CpG 
positions^'^^  as  a  result  of  the  deamination  of  5- 
methylcytosine.^^  In  addition,  all  insertions,  del¬ 
etions  and  5'  truncations  were  excluded  from  our 
calculations.  A  total  of  441  non-CpG  and  241  CpG 
mutations  occurred  within  the  231  Alu  Ya5  sub¬ 
family  members  used  in  this  analysis.  For  the  244 
Alu  Yb8  subfamily  members  analyzed,  a  total  of 
478  non-CpG  and  275  CpG  mutations  were 
observed.  Using  a  neutral  rate  of  evolution  for  pri¬ 
mate  intervening  DNA  sequences  of  0.15%  per 
million  years^^  and  the  non-CpG  mutation  density 
of  0.799%  (441/55,209)  within  the  231  Ya5  Alu 
elements  yields  an  estimated  age  of  5.32  million 
years  for  the  Ya5  subfamily  members.  Using  only 
non-CpG  mutations  in  the  244  .Yb8  sequences 
yields  an  estimate  of  5.30  million  years  old  for  the 
Yb8  subfamily  (478/60,024).  This  estimate  of  age  is 
somewhat  higher  than  the  2.7-4.1  million  years  pre¬ 
viously  reported However,  the  previous  study  of 
Ya5  and  Yb8  Alu  family  members  involved  only  a 
small  number  of  elements  making  the  calculated 
subfamily  ages  more  subject  to  random  statistical 
fluctuation.  Alternatively,  the  new  estimated  age 
based  upon  non-CpG  mutations  may  be  artificially 
inflated  due  to  sequencing  errors  in  the  human 
draft  sequence  that  may  account  for  an  increase  in 
the  number  of  mutations  observed. 

We  can  also  estimate  the  ages  of  each  Alu  sub¬ 
family  using  CpG-based  mutations.  The  only 
difference  in  the  estimate  is  to  multiply  the  CpG 
mutation  density  by  a  mutation  rate  that  is 
approximately  ten  times  the  non-CpG  rate  as  pre¬ 
viously  described.^'^^  In  this  case  we  calculate  an 
average  CpG  mutation  density  for  the  Ya5  subfam¬ 
ily  (241  mutations /1 1088  CpG  bases)  or  2.17%, 
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and  (275  mutations/ 11,224  CpG  bases)  2.45%  for 
the  subfamily.  Using  a  neutral  rate  of  evol¬ 
ution  for  CpG  based  sequences  of  1.5  %/million 
years  yields  estimates  of  1.44  and  1.63  million 
years  old  for  the  Ya5  and  Yb8  Alu  subfamilies, 
respectively.  Both  estimates  are  consistent  with  the 
initiation  of  the  expansion  of  the  Ya5  and  Yb8  Alu 
subfamilies  that  is  roughly  coincident  with  the 
divergence  of  humans  and  African  apes. 

Inspection  of  the  nucleotide  sequences  flanking 
each  Ya5  and  Yb8  Alu  family  member  shows  that 
most  of  the  elements  are  flanked  by  short  perfect 
direct  repeats.  The  direct  repeats  range  in  size  from 
3-23  nucleotides.  The  observed  direct  repeats  are 
fairly  typical  of  recently  integrated  Alu  family 
members.^'^  The  appearance  of  truncations  within 
a  number  of  these  elements  probably  occurred  as  a 
result  of  incomplete  reverse  transcription  or  impro¬ 
per  integration  into  the  genome  rather  than  by 
post-integration  instability.  All  of  the  Ya5  and  Yb8 
Alu  family  members  analyzed  have  oligo(dA)-rich 
tails  that  range  in  length  from  six  nucleotides  to 
over  60  nucleotides  in  length.  It  is  also  interesting 
to  note  that  the  3'  oligo(dA)-rich  tails  of  many  of 
the  elements  have  accumulated  random  mutations 
beginning  the  process  of  the  formation  of  simple 
sequence  repeats  of  varied  sequence  complexity. 
The  oligo(dA)-rich  tails  and  middle  A-rich  regions 
of  Alu  elements  have  previously  been  shown  to 
serve  as  nuclei  for  the  genesis  of  simple  sequence 
repeats.^® 

Alu  Y  to  Yb8  sequence  evolution 

In  our  query  of  the  human  genome,  we  ident¬ 
ified  88  Alu  elements  containing  one  to  seven  of 
the  eight  Yb8  diagnostic  nucleotides.  These  88 
"'mosaic''  elements  were  subdivided  into  Ybl,  Yb2, 
Yb4,  Yb5,  Yb6  and  Yb7  depending  on  the  number 
of  diagnostic  changes  present  (Figure  1(a)).  To 
facilitate  identification  of  the  individual  elements 
with  different  diagnostic  mutation  combinations, 
the  mosaic  elements  were  numbered  consecutively 
in  order  of  abundance  (Ybl.l,  Ybl. 2,  etc.,  see 
Figure  1(a)).  No  evident  sequential  order  of 
accumulation  of  the  Yb8  diagnostic  mutations  can 
be  easily  discerned.  Interpretation  becomes  compli¬ 
cated  due  to  the  fact  that  four  out  the  eight  diag¬ 
nostic  mutations  are  CpG  changes  (positions  1,  2,  4 
and  6  Figure  1(a)).  The  Alu  Y  has  three  CpG  sites 
(positions  1,  2  and  6)  that  become  TpG  in  Yh8,  and 
Alu  Yb8  has  one  (position  4).  CpG  dinucleotides 
mutate  at  a  rate  that  is  about  9.2  times  faster  than 
non-CpG,^'^^  as  a  result  of  the  deamination  of  5- 
methylcytosine.^^  Therefore,  it  is  difficult  to  know 
if  the  presence  of  a  TpG  diagnostic  mutation  is  due 
to  a  change  in  the  Alu  source  gene  or  in  the  par¬ 
ticular  individual  Alu  element  being  evaluated. 
Because  CpG  dinucleotides  represent  hot  spots  for 
mutation,  a  high  proportion  of  CpG  positions  in 
the  Y  subfamily  might  have  mutated  to  TpG.  This 
makes  discrimination  between  source  gene  changes 
and  parallel  forward  mutations  occurring  in  mul¬ 


tiple  Y  elements  at  these  loci  difficult.  Therefore, 
we  have  eliminated  these  sites  (positions  1,  2  and 
6)  from  our  analysis  (Figure  1(b)).  Position  4  rep¬ 
resents  a  different  situation.  Because  the  TpG  to 
CpG  mutation  occurs  at  the  normal  evolutionary 
rate,  it  was  not  eliminated  from  the  analysis.  How¬ 
ever,  some  variations  may  be  observed  where  indi¬ 
vidual  copies  might  have  mutated  the  position 
back  to  a  TpG  that  need  to  be  taken  into  consider¬ 
ation.  Now,  a  sequential  evolution  of  the  appear- 
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Figure  1.  Evolution  of  the  diagnostic  nucleotide  pos¬ 
itions  from  Y  to  YbS  Alu  elements,  (a)  Alignment  of  the 
eight  Alu  YbS  diagnostic  nucleotides  and  the  different 
Ybl,  2,  3,  4,  etc.  elements  found  in  the  databases.  The 
eight  diagnostic  nucleotides  are  indicated  in  bold  at  the 
top  for  Alu  Y,  and  for  Alu  YbS  at  the  bottom.  At  pos¬ 
ition  8,  —  or  d  represents  the  absence  or  presence  of  the 
seven  nucleotide  duplication,  respectively.  For  easy 
reference,  individual  elements  containing  different  com¬ 
binations  of  the  diagnostic  mutations  were  numbered 
consecutively  in  order  of  abundance  (Ybl.l,  Ybl. 2  ,  etc.). 
The  total  number  of  elements  found  for  each  subgroup 
is  indicated  on  the  left  in  parenthesis.  Note  that  no 
Ybl.l  was  found  (0).  The  total  number  of  the  YbS  indi¬ 
vidual  diagnostic  sites  found  in  all  the  intermediate 
elements  is  indicated  at  the  bottom,  (b)  Alignment  of 
the  same  elements  after  eliminating  the  diagnostic  sites 
in  Alu  Y  elements  involving  CpG  to  T  changes.  Com¬ 
mas  separate  elements  within  die  same  Yb  group  and 
dashes  between  different  groups,  i.e.  Ybl.2,7-4.2  rep¬ 
resents  Ybl.2,  Ybl.7  and  Yb4.2.  The  suggested  evol¬ 
utionary  order  of  the  occurrence  of  the  changes  at  the 
diagnostic  sites  are  indicated  at  the  bottom  (#1,  #2 . . . ). 
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ance  of  the  diagnostic  sites  can  be  obtained,  start¬ 
ing  with  position  3,  then  4,  7  and/or  8,  and  finally 
position  5  (Figure  1(b)).  The  mutation  at  position  3 
appears  to  have  occurred  first,  being  the  most  com¬ 
mon  single  nucleotide  change  with  15  Yb8  mosaic 
elements.  The  other  Alu  Yb8  mosaic  elements  with 
only  one  diagnostic  nucleotide  change  occur  in 
lower  frequencies  and  may  be  explained  by  paral¬ 
lel  mutations,  post-transcriptional  selection,®  or  by 
a  forward  gene  conversion  event.  The  order  in 
which  the  mutation  at  positions  7  and  8  (the  seven 
nucleotide  duplication)  occurred  cannot  be 
resolved  with  these  data.  Four  of  the  elements 
(Yb6.2  in  Figure  1(b))  do  not  fit  the  proposed 
sequential  evolutionary  pattern.  In  this  case  mul¬ 
tiple  recombination  events  would  be  required  to 
obtain  this  outcome  or  some  selection  occurring  at 
the  retroposition  process,  both  highly  unlikely. 
Alternatively,  position  5  may  be  explained  by  gene 
conversion  events  or  parallel  mutations.  The  possi¬ 
bility  of  gene  conversion  between  Alu  repeats  has 
been  suggested  previously.^^  In  addition,  limited 
amounts  of  gene  conversion  between  Yb8  Alu 
elements^^'®°  and  extensive  levels  of  short  gene 
conversions  in  the  Ya5  subfamily^®  have  been  pre¬ 
viously  reported. 

Phylogenetic  origin 

In  order  to  determine  the  approximate  time  of 
origin  of  each  Alu  subfamily  member  (Ya5  and 
Yb8)  in  the  primate  lineage,  we  amplified  a  series 
of  human  and  non-human  primate  DNA  samples 
using  the  polymerase  chain  reaction  (PCR)  and  the 
oligonucleotide  primers  shown  in  Tables  1  and  2. 
In  this  assay,  genomes  that  are  homozygous  for 
the  presence  of  an  Alu  element  amplify  a  PCR  pro¬ 
duct  about  400  bases  in  length.  Genomes  that  do 
not  contain  the  Alu  element  at  a  particular  chromo¬ 
somal  location  amplify  a  100  bp  fragment,  while 
heterozygous  genomes  amplify  both  fragments. 
Using  this  approach  we  investigated  the  phyloge¬ 
netic  origin  of  each  Alu  element.  All  231  Ya5  Alu 
family  members  were  subjected  to  this  analysis 
and  only  one  element  (Ya5NBC42)  was  present  in 
the  orthologous  locus  from  the  common  chimpan¬ 
zee  genome.  For  the  Yb8  subfamily,  244  elements 
were  assayed  with  none  being  present  in  the  com¬ 
mon  chimpanzee  genome.  This  suggests  that 
almost  all  of  these  Alu  elements  dispersed  within 
the  human  genome  sometime  after  the  human  and 
African  ape  divergence  and  that  less  than  0.21% 
(1/475)  of  the  Ya5  and  Yb8  Alu  subfamily  mem¬ 
bers  in  the  human  genome  also  reside  in  non¬ 
human  primate  genomes.  In  fact,  this  is  only  the 
second  Ya5  Alu  element  ever  reported  that  is  also 
found  in  the  genome  of  a  non-human  primate. 

Human  genomic  diversity 

In  order  to  determine  the  human  genomic  vari¬ 
ation  associated  with  each  of  the  Ya5  and  Yb8  Alu 
family  members,  each  element  was  subjected  to 


PCR  amplification  (outlined  above)  on  a  panel  of 
human  DNA  samples.  The  panel  was  composed  of 
20  individuals  of  European  origin,  20  African 
Americans,  20  Greenland  Natives  or  Asians  and  20 
Egyptians  for  a  total  of  80  individuals  (160 
chromosomes).  Using  this  approach  134  Alu  Ya5 
(Table  1)  and  160  Yb8  (Table  2)  subfamily  members 
were  monomorphic  for  the  presence  of  the  Alu 
element,  suggesting  that  these  elements  integrated 
in  the  genome  prior  to  the  radiation  of  extant 
humans.  A  total  of  28  Ya5  and  Yb8  Alu  family 
members  appeared  heterozygous  in  all  of  the  indi¬ 
viduals  that  were  analyzed,  suggesting  that  they 
had  integrated  into  previously  imdefined  repeated 
regions  within  the  human  genome  as  reported  pre¬ 
viously.®^  In  the  PCR-based  assay  these  elements 
generate  a  pre-integration  site  size  product  from 
the  duplicate  copies  of  the  pre-integration  site 
located  throughout  the  genome  along  with  an  Alu 
filled  site  from  the  one  pre-integration  site 
sequence  that  contains  the  new  Alu  insertion. 
These  elements  were  not  subjected  to  any  further 
analysis.  An  additional  six  elements  were  located 
in  other  repetitive  regions  of  the  genome  that  were 
identified  computationally  and  discarded  from 
further  analysis.  The  remaining  elements  were 
polymorphic  for  the  presence  of  an  Alu  repeat 
within  the  genomes  of  the  test  panel  individuals 
(Tables  3  and  4).  Loci  that  were  polymorphic  for 
the  presence/absence  of  individual  Alu  insertions 
were  subsequently  classified  as  high,  low  or  inter¬ 
mediate  frequency  insertion  polymorphisms 
(defined  in  Tables  1  and  2).  The  unbiased  hetero¬ 
zygosity  values  (corrected  for  small  sample  sizes) 
for  these  polymorphic  Alu  insertions  were  variable, 
and  approached  the  theoretical  maximum  of  50  % 
in  several  cases.  This  suggests  that  many  of  these 
Alu  insertion  polymorphisms  will  make  excellent 
markers  for  the  study  of  human  population  gen¬ 
etics.  Approximately  25%  (58/231)  of  the  ran¬ 
domly  identified  Ya5  and  20%  (48/244)  of  the  Yb8 
Alu  family  members  are  polymorphic  for  insertion 
presence/absence  within  the  human  genome. 
These  results  are  in  good  agreement  with  previous 
estimates  of  the  percentages  of  insertion  poly¬ 
morphisms  within  these  two  Alu  subfamilies.^^ 

The  Alu  inserts  that  have  been  in  the  genome 
longest  are  more  likely  to  approach  fixation.  There¬ 
fore,  we  might  expect  to  find  different  levels  of 
sequence  divergence  for  the  Alu  elements  from 
each  insertion  frequency  class.  Using  this  approach 
the  average  number  of  non-CpG/CpG-based 
mutations  for  the  Ya5  Alu  family  was  1.62/1.06, 
2.83/0.67,  2.16/0.66  and  2.53/1.0  for  the  fixed  pre¬ 
sent,  high  frequency,  intermediate  frequency  and 
low  frequency  Alu  insertion  polymorphisms, 
respectively.  In  the  case  of  the  Yb8  subfamily  the 
average  number  of  non-CpG/CpG  mutations  was 
1.86/1.16,  5.0/0.6,  2.2/0.66  and  1.7/1.2  for  the 
fixed  present,  high  frequency,  intermediate  fre¬ 
quency  and  low  frequency  Alu  insertion  poly¬ 
morphisms,  respectively.  In  all  cases  the  standard 
deviations  for  each  average  were  as  large  or  larger 
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than  the  average  number  of  mutations  reflecting 
the  heterogeneity  in  the  dataset.  No  detectable 
difference  in  the  mutation  density  within  each  fre¬ 
quency  class  of  Alu  insertions  was  observed. 
Therefore,  our  data  suggest  that  any  sequence 
differences  between  the  polymorphic  elements  and 
those  with  fixed  presence  may  be  obscured  because 
of  the  small  number  of  total  mutations  and  sequen¬ 
cing  errors  (see  Discussion). 


Discussion 

Alu  elements  account  for  more  than  10%  of  the 
mass  of  the  human  genome.  The  majority  of  Alu 
elements  integrated  into  the  genome  early  in  pri¬ 
mate  evolution.  Only  a  small  number  of  elements 
(a  few  thousand)  have  amplified  in  the  human 
genome  after  the  divergence  of  humans  and  Afri¬ 
can  apes.  Here,  we  report  an  investigation  of  the 
dispersion  and  insertion  polymorphism  of  the  two 
largest  subfamilies  of  recently  integrated  Alu 
repeats  within  the  human  genome.  Our  copy  num¬ 
ber  estimates  of  2640  Ya5  and  1852  Alu 
elements  within  the  draft  sequence  of  the  human 
genome  are  in  fairly  good  agreement  with  previous 
estimates  of  the  sizes  of  these  Alu  subfamilies 
although  they  both  exceed  the  previously  pub¬ 
lished  figures.^^ 

Using  the  mutation  density  and  a  neutral 
mutation  rate  we  were  able  to  estimate  the  ages  of 
each  subfamily  as  5.32  million  years  (myr)  old  for 
Ya5  and  5.30  myr  old  for  Yb8  using  non-CpG- 
based  estimates  and  1.44  myr  (Ya5)  and  1.71  myr 
(Yb8)  using  the  CpG  mutation  density.  Each  of 
these  reported  average  ages  based  upon  non-CpG 
mutation  density  is  substantially  higher  than  those 
reported  previously  of  about  1  myr  and  2,7  to  4.1 
myr  for  the  Ya5  and  Yb8  subfamilies,  while  the 
estimates  based  upon  CpG  mutation  density  com¬ 
pare  favorably  to  those  previously  reported. If 
we  assume  a  linear  amplification  of  these  Alu  sub¬ 
families  in  the  human  genome,  the  oldest  elements 
would  be  no  greater  than  10.64  myr  old  for  Ya5 
and  10.6  myr  old  for  Yb8  using  non-CpG  mutation 
density,  or  2.88  myr  old  for  Ya5  and  3.42  myr  old 
for  Yb8  using  the  CpG  mutation  density.  The  non- 
CpG  based  estimates  for  the  oldest  subfamily 
members  appears  to  be  somewhat  higher  than 
expected  for  a  group  of  repeated  DNA  sequences 
that  largely  amplified  within  the  human  genome 
after  the  divergence  of  humans  and  African  apes 
which  is  thought  to  have  occurred  within  the  last 
4-6  myr.^^  This  discrepancy  between  the  two  esti¬ 
mates  can  be  explained  by  considering  sequencing 
errors  as  a  potential  factor  influencing  our  current 
calculations.  In  the  determination  of  the  non-CpG 
mutations  for  the  estimation  of  the  Alu  subfamily 
age,  sequencing  errors  would  be  included  in  the 
coimt  as  mutations,  making  the  estimated  age 
higher  than  the  actual  age  for  the  subfamily.  If  we 
assume  that  the  sequencing  errors  are  distributed 
evenly  across  the  entire  Alu  sequence,  then  the 


number  of  sequencing  errors  would  be  higher  in 
the  non-CpG-based  estimates  than  the  CpG-based 
estimates,  since  there  are  more  non-CpG  (242-246) 
than  CpG  (only  44-48)  nucleotides  in  the  subfamily 
consensus  sequences.  Our  observation  that  the 
levels  of  sequence  divergence  from  the  subfamily 
consensus  sequences  do  not  effectively  correlate 
with  polymorphism  levels  in  the  human  genome 
also  argues  that  it  will  not  be  beneficial  to  use 
sequence  divergence  from  the  subfamily  consensus 
sequences  as  a  method  for  the  identification  of 
additional  polymorphic  members  of  these  Alu  sub¬ 
families. 

We  can  also  compare  the  calculated  ages  of  each 
Alu  subfamily  based  upon  non-CpG  mutation  den¬ 
sity  as  a  whole  to  the  estimated  percentages  of  Alu 
insertion  polymorphisms  and  copy  number  to 
evaluate  the  contribution  that  these  elements  make 
to  human  genomic  diversity.  Here,  we  report  esti¬ 
mated  ages  of  1.44  myr  for  the  Ya5  subfamily  and 
1.71  myr  for  the  Yb8  subfamily.  The  percentage  of 
Alu  insertion  polymorphisms  in  each  of  the  subfa¬ 
milies  was  25%  for  the  Ya5  subfamily  and  20%  for 
the  Yb8  subfamily.  The  copy  numbers  of  the  two 
subfamilies  of  Alu  elements  were  also  different 
with  2640  Ya5  Alu  elements  and  1852  Yb8 
elements.  When  considered  together  these  data 
indicate  that  the  Ya5  Alu  subfamily  with  both  a 
higher  copy  number  and  more  insertion  poly¬ 
morphisms  has  been  more  successful  at  amplifica¬ 
tion  within  the  human  genome.  In  fact,  if  we 
assume  that  the  ages  of  the  two  subfamilies  are 
about  the  same  the  Ya5  subfamily  has  been  about 
40%  more  efficient  at  amplification  in  terms  of 
both  copy  number  and  the  generation  of  new  Alu 
insertion  polymorphisms  within  the  human  gen¬ 
ome.  Although  the  sample  size  is  presently  small, 
this  is  also  in  good  agreement  with  the  number  of 
previously  reported  Ya5  (six)  and  Yb8  (three)  Alu 
repeats  associated  with  different  human  diseases 
(reviewed  in  ref.  22).  In  addition,  these  data  also 
provide  compelling  support  for  the  simultaneous 
expansion  of  multiple  Alu  subfamilies  within  the 
human  genome.  The  reasons  for  the  differential 
amplification  of  the  two  Alu  subfamilies  remain 
unknown.  However,  they  likely  reside  in  the  abil¬ 
ity  of  each  subfamily  to  produce  RNA  for  retropo- 
sition  or  at  some  other  point  in  the  process  of 
retroposition  itself  such  as  the  reverse  transcription 
step.  Further  experiments  will  be  required  to  deter¬ 
mine  the  precise  molecular  mechanism(s)  leading 
to  the  differential  expansion  of  these  two  Alu  sub¬ 
families  within  the  human  genome. 

Using  the  non-CpG-based  average  ages  of  the 
Ya5  and  Yb8  Alu  subfamilies  along  with  a  linear 
amplification  rate  we  can  also  estimate  the  number 
of  members  from  each  Alu  subfamily  that  should 
be  present  within  the  orthologous  loci  of  the  non¬ 
human  primate  genomes.  Using  this  approach  the 
oldest  Alu  repeats  from  each  subfamily  would  be 
approximately  twice  the  average  age.  In  other 
words,  the  Ya5  subfamily  would  have  begun  to 
expand  10.64  myr  ago  with  the  Yb8  subfamily  hav- 


Table  1.  Alu  Ya5  accession  numbers,  locations,  human  diversity,  oligonucleotide  primers  and  PCR  parameters 
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ing  expanded  about  10.6  myr  ago.  If  we  assume 
that  humans  and  African  apes  diverged  from  each 
other  only  4  myr  ago,  then  we  can  calculate  that 
6.64/10.64  (62%)  and  6.6/10.6  (62%)  of  the  Ya5 
and  Yb8  Alu  elements  should  also  be  found  at 
orthologous  positions  within  the  genomes  of  non¬ 
human  primates.  If  we  shift  the  divergence  of 
humans  and  African  apes  to  6  million  years  ago 
then  the  estimates  change  to  4.64/10.64  (44%)  and 
4.6/10.6  (43%).  However,  less  than  0.21%  of  the 
elements  were  also  located  in  orthologous  pos¬ 
itions  in  the  genome  of  the  common  chimpanzee. 
The  observed  distribution  of  Ya5  and  Yb8  Alu 
repeats  located  within  the  common  chimpanzee 
genome  would  require  a  human  and  non-human 
primate  divergence  of  greater  than  10  myr  ago. 
This  is  clearly  a  much  older  divergence  time  than 
is  commonly  accepted. 

Three  potential  explanations  may  account  for 
this.  One  is  the  selective  removal  of  Alu  elements 
from  orthologous  positions  in  non-human  primate 
genomes  effectively  resulting  in  an  ascertainment 
bias  against  elements  in  the  non-human  primate 
genomes  because  our  elements  were  obtained  by 
scanning  a  database  of  human  genomic  sequences. 
However,  we  consider  this  to  be  highly  unlikely, 
because  there  are  no  known  mechanisms  to  specifi¬ 
cally  remove  Alu  elements  from  primate  genomes 
and  even  when  an  element  is  partially  deleted 
from  the  genome  it  leaves  behind  a  signature  of 
itself.^^  A  second  and  more  likely  explanation  is 
that  the  amplification  rate  for  these  subfamilies  has 
increased  recently  in  the  human  lineage.  Alterna¬ 
tively,  the  higher  average  ages  for  each  of  the  Alu 
subfamilies  than  those  previously  reported  may 
reflect  a  higher  sequencing  error  rate  in  the  gen¬ 
ome  database,  resulting  in  an  inflated  age  estimate 
for  the  Alu  subfamilies.  The  estimated  ages  of  the 
subfamilies  are  also  inflated  by  the  faster  accumu¬ 
lation  of  non-CpG  based  mutations  (as  a  result  of 
the  larger  number  of  potential  target  sites)  as  com¬ 
pared  to  CpG  nucleotides.  Therefore,  the  use  of  the 
CpG-based  mutation  density  for  Alu  subfamily  age 
estimates  will  be  much  more  accurate  than  the  use 
of  non-CpG  mutation  density-based  estimates 
using  the  current  draft  sequence  of  the  human  gen¬ 
ome.  The  magnitude  of  the  putative  sequencing 
errors  can  be  estimated  by  comparing  the  pre¬ 
viously  reported  non-CpG  mutation  density 
for  these  Alu  subfamilies  of  approximately  0.4% 
for  the  Ya5  and  Yb8  Alu  elements  to  the  levels 
reported  here  of  approximately  0.8  %  for  the 
same  subfamilies.  Therefore,  the  maximum 
possible  error  rate  would  be  estimated  as 
0.8  %  —  0.4  %  =  0.4  %.  In  our  data  analysis,  there 
are  a  few  Alu  elements  with  much  higher  mutation 
densities  than  previously  seen.  We  are  not  sure 
whether  these  represent  a  small  number  of  auth¬ 
entic,  highly  divergent  subfamily  members 
(approximately  10%  divergence),  or  the  concen¬ 
tration  of  sequence  errors  in  a  few  elements.  Thus, 
other  than  the  possibility  of  a  few  areas  where 
errors  may  be  concentrated,  there  is  a  relatively 


low  sequencing  error  rate  across  the  entire  data¬ 
base,  demonstrating  the  reliability  of  the  draft 
human  genomic  sequence.  Large  scale  re-sequen- 
cing  of  the  Alu  elements  characterized  in  this 
paper  would  resolve  this  issue  and  allow  for  an 
accurate  estimate  of  sequencing  error  rates  within 
the  draft  human  genomic  sequence;  it  would  also 
provide  a  refined  estimation  of  the  average  age  of 
the  Alu  Ya5  and  Yb8  subfamilies  as  well. 

SINE  retroposition  is  the  primary  mode  of 
mobilization  of  Alu  elements,  where  mutations  in 
the  source  gene(s)  create  their  sequence  evolution. 
However,  previously  we  reported  that  gene 
conversion  and  genetic  instability  might  have  also 
significantly  impacted  the  Alu  sequence  architec¬ 
ture.’®  Our  analysis  of  the  Yb8  mosaic  elements 
also  suggests  that  gene  conversion  may  have  influ¬ 
enced  the  evolution  of  the  Yb8  Alu  subfamily. 
Among  the  alternative  explanations  for  the  occur¬ 
rence  of  mosaic  elements,  multiple  parallel 
mutations  seems  unlikely;  unless  there  was  selec¬ 
tion  for  these  specific  mutations,  such  as  the  post- 
transcriptional  selection  previously  proposed.® 
However,  a  selection  process  that  would  only 
select  for  these  specific  mutations  would  be 
improbable.  Recombination  may  have  generated 
some  of  these  mosaic  elements,  but  multiple 
recombination  events  would  be  required,  making  it 
unlikely.  Therefore,  we  believe  gene  conversion  to 
be  the  most  likely  explanation  for  the  existence  of 
the  mosaic  Alu  elements. 

Our  analysis  of  the  human  genomic  diversity 
associated  with  the  Ya5  and  Yb8  Alu  elements 
reported  here  resulted  in  the  recovery  of  106  new 
Alu  insertion  polymorphisms.  The  percentages  of 
Alu  insertion  polymorphisms  recovered  from  each 
subfamily  were  25  %  and  20%  for  the  Ya5  and  Yb8 
subfamilies,  respectively.  The  percentages  of  Alu 
insertion  polymorphisms  in  these  two  subfamilies 
are  in  good  agreement  with  previously  published 
insertion  polymorphism  estimates  for  these  Alu 
subfamilies.^’  We  can  also  estimate  the  total  num¬ 
ber  of  Alu  insertion  polymorphisms  within  the 
draft  sequence  of  the  human  genome  using  our 
copy  number  estimates  and  the  percentage  of  Alu 
insertion  polymorphisms  associated  with  each 
family.  Using  this  approach  we  should  recover 
2640  X  0.25  or  about  660  Ya5  Alu  insertion  poly¬ 
morphisms  and  1852  x  0.20  or  about  370  Yb8  Alu 
insertion  polymorphisms  through  the  exhaustive 
analysis  of  the  draft  sequence  of  the  human  gen¬ 
ome.  Therefore,  the  exhaustive  analysis  of  the 
entire  Ya5  and  Yb8  Alu  subfamilies  from  the  draft 
sequence  of  the  human  genome  should  generate  a 
little  more  than  1000  Alu  insertion  polymorphisms 
from  these  subfamilies. 

Additional  Alu  insertion  polymorphisms  that  are 
present  in  diverse  human  genomes  may  also  be 
recovered  using  PCR  based  display  approaches 
such  as  those  previously  reported  for  Alu  and 
LINE  elements.’^'^'’  Each  of  the  Alu  insertion  poly¬ 
morphisms  in  the  genome  is  a  temporal  genomic 
fossil  that  is  identical  by  descent  with  a  known 
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ancestral  state.^^'^^  Previously,  the  analysis  of  Alu 
insertion  polymorphisms  has  proved  useful  for  the 
study  of  human  population  genetics.^^”"*^  The 
newly  identified  Alu  insertion  polymorphisms 
from  the  Ya5  and  Yb8  Alu  subfamilies  should 
prove  useful  for  the  study  of  human  population 
genetics. 


Materials  and  Methods 


Cell  lines  and  DNA  samples 

The  cell  lines  used  to  isolate  primate  DNA  samples 
were  as  follows:  human  (Homo  sapiens),  HeLa  (ATCC 
CCL2);  and  chimpanzee  (Pan  troglodytes),  Wes  (ATCC 
CRL1609).  Cell  lines  were  maintained  as  directed  by  the 
source  and  DNA  isolations  were  performed  using 
Wizard  genomic  DNA  purification  (Promega).  Human 
DNA  samples  from  the  European,  African  American, 
Asian,  Egyptian,  and  Greenland  Native  population 
groups  were  isolated  from  peripheral  blood  lympho¬ 
cytes'^  available  from  previous  studies.^® 


Computational  analyses 

Initial  screening  of  the  GenBank  non-redundant  and 
high  throughput  genomic  sequence  (HTGS)  databases 
was  performed  using  the  Basic  Local  Alignment  Search 
Tool  (BLAST)'*®  available  from  the  National  Center 
for  Biotechnology  Information  (http://www.ncbi. 
nlm.nih.gov/).  Copy  number  estimates  were  determined 
using  Megablast  and  the  draft  human  genome  sequence 
database.^  The  database  was  searched  for  exact 
complements  to  the  oligonucleotide  5'-CCATCCC- 
GGCTAAAAC-3'  and  5'-TGCGCCACTGCAGTCCG- 
CAGTCCG-3'  that  are  exact  matches  to  a  portion  of  the 
Alu  Ya5  and  Yb8  subfamily  consensus  sequences 
(respectively)  that  contain  unique  diagnostic  mutations.^* 
Sequences  that  were  exact  complements  to  the  oligonu¬ 
cleotides  were  then  subjected  to  more  detailed  annota¬ 
tion.  A  region  composed  of  500-1000  bases  of  flanking 
DNA  sequence  directly  adjacent  to  the  sequences  ident¬ 
ified  from  the  databases  that  matched  the  initial 
GenBank  BLAST  query  were  subjected  to  annotation 
using  the  Repea tMasker2  program  from  the  University 
of  Washington  Genome  Center  server  (http://ftp. 
genome.washington.edu /c/s.dll/RepeatMasker)  or  Cen¬ 
sor  from  the  Genetic  Information  Research  Institute 
(http:  // www.girinst.org/Censor_Server-Data_Entry_ 
Forms. html).'*^  These  programs  annotate  the  repeat 
sequence  content  of  individual  sequences  from  humans 
and  rodents.  A  complete  list  of  the  Alu  elements  ident¬ 
ified  from  the  GenBank  search  is  available  from  MAB. 
The  copy  numbers  for  each  subfamily  of  Alu  elements 
were  determined  by  screening  the  draft  sequence  of  the 
entire  human  genome  with  the  oligonucleotides  shown 
above.^  For  the  Yb8  subfamily  analysis,  the  database 
was  searched  for  matches  to  the  consensus  Yb8  sequence 
without  the  seven-nucleotide  duplication  (287  bases). 
The  sequences  were  then  subjected  to  more  detailed 
analysis  using  Meg  Align  (DNAStar  version  3.1.7  for 
Windows  3.2)  selecting  only  for  Yb8  intermediate 
elements  containing  between  one  and  seven  of  the  Yb8 
diagnostic  sites. 


Primer  design  and  PCR  ampilfication 

PCR  primers  were  designed  from  flanking  unique 
DNA  sequences  adjacent  to  individual  Ya5  and  Yb8  Alu 
elements  using  the  PrimerS  software  (Whitehead  Insti¬ 
tute  for  Biomedical  Research,  Cambridge,  MA,  USA) 
(http://www.genome.wi.mit.edu/cgi-bin/primer/pri- 
mer3_www.cgi).  The  resultant  PCR  primers  were 
screened  against  the  GenBar\k  non-redundant  database 
for  the  presence  of  repetitive  elements  using  the  BLAST 
program,  and  primers  that  resided  within  known  repeti¬ 
tive  elements  were  discarded  and  new  primers  were 
designed.  PCR  amplification  was  carried  out  in  25  nl 
reactions  using  50-100  ng  of  target  DNA,  40  pM  of  each 
oligonucleotide  primer,  200  pM  dNTPs  in  50  mM  KCl, 
1.5  mM  MgCl2,  10  mM  Tris-HCl  (pH  8.4)  and  Taq^ 
DNA  polymerase  (1.25  units)  as  recommended  by  the 
supplier  (Life  Technologies).  Each  sample  was  subjected 
to  the  following  amplification  cycle:  an  initial  denatura- 
tion  of  150  seconds  at  94  °C,  one  minute  of  denaturation 
at  94  °C,  one  minute  at  the  annealing  temperature,  one 
minute  of  extension  at  72  ®C,  repeated  for  32  cycles,  fol¬ 
lowed  by  a  final  extension  at  72  °C  for  ten  minutes.  For 
analysis,  20  pi  of  each  sample  was  fractionated  on  a  2  % 
agarose  gel  with  0.25  pg/ml  ethidium  bromide.  PCR 
products  were  directly  visualized  using  UV  fluorescence. 
The  sequences  of  the  oligonucleotide  primers,  annealing 
temperatures,  PCR  product  sizes  and  chromosomal 
locations  for  all  Ya5  and  Yb8  elements  can  be  found  on 
our  website  (http://129.81.225.52).  Phylogenetic  analysis 
of  all  the  ascertained  Alu  elements  was  determined  by 
PCR  amplification  of  human  and  non-human  primate 
DNA  samples.  The  human  genomic  diversity  associated 
with  each  Alu  element  was  determined  by  the  amplifica¬ 
tion  of  20  individuals  from  each  of  four  populations 
(African-American,  Greenland  Native  or  Asian,  Euro¬ 
pean  and  Egyptian)  (160  total  chromosomes).  The  chro¬ 
mosomal  location  of  Alu  repeats  identified  from  clones 
that  had  not  been  previously  mapped  was  determined 
by  PCR  amplification  of  National  Institute  of  General 
Medical  Sciences  (NIGMS)  human/rodent  somatic  cell 
hybrid  mapping  panel  2  (Coriell  Institute  for  Medical 
Research,  Camden,  NJ). 
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ABSTRACT 

Genomic  database  mining  has  been  a  very  useful  aid  in  the  idendficadon  and  retrieval  of  recendy 
integrated  Alu  elements  from  the  human  genome.  We  analyzed  Alu  elements  retrieved  from  the  GenBank 
database  and  idendfied  two  new  Alu  subfamilies,  Alu  Yb9  and  Alu  Yc2,  and  further  characterized  Ycl 
subfamily  members.  Some  members  of  each  of  the  three  subfamilies  have  inserted  in  the  human  genome 
so  recendy  that  about  a  one-third  of  the  analyzed  elements  are  polymorphic  for  the  presence/absence 
of  the  Alu  repeat  in  diverse  human  populadons.  These  newly  idendfied  Alu  insertion  polymorphisms  will 
serve  as  idendcal-by-descent  generic  markers  for  the  study  of  human  evoludon  and  forensics.  Three 
previously  classiBed  Alu  Y  elements  linked  with  disease  belong  to  the  Ycl  subfamily,  supporting  the 
rctroposidon  potcndal  of  this  subfamily  and  demonstradng  that  the  Alu  Y  subfamily  currendy  has  a  very 
low  amplificadon  rate  in  the  human  genome. 


Alu  elements  have  been  accumuladng  in  the  human 
^  genome  throughout  primate  evoludon,  reaching 
a  copy  number  of  over  a  million  per  genome.  However, 
most  of  these  Alu  copies  are  not  identical  and  can  be 
classified  into  several  subfamilies  .(reviewed  in  Dei- 
NiNGER  and  Batzer  1993).  These  different  subfamilies 
of  Alu  elements  were  generated  once  mutations  oc¬ 
curred  within  the  “master"  or  “source”  gene  that  actively 
retroposed  at  different  rates  and  time  periods  of  primate 
evolution  (Deininger  et  al.  1992).  Currendy,  the  Alu 
retroposition  rate  is  reduced  by  lOO-fold  from  its  peak 
early  in  primate  evolution  (Shen  aiL  1991).  The  vast 
majority  of  the  Alu  elements  present  in  the  human 
genome  inserted  before  the  radiation  of  extant  humans 
and  are  therefore  observed  in  all  individuals  in  the  hu¬ 
man  population.  However,  almost  all  of  the  recendy 
integrated  Alu  elements  in  the  human  genome  are  re¬ 
stricted  to  several  closely  related  “young”  subfamilies, 
with  the  majority  being  Ya5  and  Yb8  subfamily  members 
(Batzer  et  oL  1994, 1995).  Several  of  these  new  subfami¬ 
lies  appear  to  originate  from  an  Alu  element  that  fortu¬ 
itously  inserted  into  a  favorable  region  of  the  genome 
capable  of  supporting  Alu  retroposition.  Subsequent 
or  concurrent  mutations  in  the  new  source  element(s) 
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result  in  groups  of  elements  that  are  identifiable  as  new 
subfamilies. 

Collectively,  the  Alu  Y,  Ya5,  Ya5a2,  Ya8,  and  Yb8  sub¬ 
families  comprise  <10%  of  the  Alu  elements  present 
within  the  human  genome,  with  the  Ya5/8  and  Yb8 
subfamilies  together  accounting  for  <0.5%  of  all  Alu 
elements.  Although  the  human  genome  contains 
>1,000,000  copies  of  Alu  (~15%  of  the  genome;  Smit 
1996),  <0.5%  are  polymorphic.  Due  to  their  recent 
evolutionary  introduction  into  the  human  genome, 
many  of  the  young  Alu  elements  are  polymorphic  be¬ 
tween  individuals  and/or  populations.  There  is  an  in¬ 
verse  correlation  between  the  age  of  the  Alu  subfamily 
and  the  percentage  of  polymorphic  elements  it  con¬ 
tains.  Identification  of  evolutionarily  recent  Alu  sub¬ 
families  and  their  polymorphic  insertions  is  useful  for 
human  population  studies,  forensics,  and  DNA  finger¬ 
printing  for  two  reasons:  (i)  There  is  no  apparent  spe¬ 
cific  mechanism  to  remove  newly  inserted  Alu  repeats, 
making  inserts  identical  by  descent;  and  (ii)  the  Alu 
insertions  have  a  known  ancestral  state  (Batzer  and 
Deininger  1991;  Batzer  et  al  1994). 

The  availability  of  large  quantities  of  human  genomic 
DNA  sequence  provided  by  the  Human  Genome  Project 
facilitates  genomic  database  mining  for  recently  inte¬ 
grated  Alu  elements.  Through  this  approach  we  were 
able  to  identify  the  youngest  Alu  subfamily  reported  to 
date,  termed  (Ya5a2),  and  determined  that  the  m^ority 
of  its  members  are  Alu  insertion  polymorphisms  (Roy 
et  aL  2000).  We  expanded  our  computational  analyses 
to  identify  other  Alu  subfamilies  derived  from  the  Alu 
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Y  and  Yb8  subfamilies.  Here,  we  present  the  analysis  of 
three  of  the  most  recently  formed  Alu  subfamilies  and 
demonstrate  their  utility  for  the  study  of  human  geno¬ 
mic  diversity, 

MATERIALS  AND  METHODS 

Computational  analyses:  Sequence  alignments  for  the  iden¬ 
tification  of  Alu  subfamilies  were  made  using  MegAlign  soft¬ 
ware  (DNAStar  version  3.1.7  for  Windows  3.2).  Screening  of 
the  GenBank  nonredundant  (nr),  the  high  throughput  ge¬ 
nome  sequence  (htgs),  and  the  genomic  survey  sequence  (gss) 
databases  was  performed  using  the  advanced  basic  local  align¬ 
ment  search  tool  2.0  (BLAST;  Altschui.  el  al  1990)  available 
from  the  National  Center  for  Biotechnology  Information 
(httpr^www.ncbi.nlm.nih.gov/).  Database  searches  for  Yb8 
consensus  Alus  showed  a  common  single-base  variant  termed 
Yb9.  The  databases  were  searched  for  matches  to  the  289  bases 
of  the  Yb9  consensus  sequence  (as  inferred  from  the  previous 
Yb8  analysis)  or  the  281  bases  of  the  Alu  Y  consensus  with 
the  expected  value  (real)  set  at  -e  LO^"'-'"  and  -e  LOif''*^^ 
respectively,  in  the  advanced  BLAST  options.  Only  Alu  Yb9 
elements  with  all  nine  diagnostic  mutations  were  selected.  A 
similar  type  of  search  procedure  was  performed  with  the  Ycl 
and  Yc2  consensus  sequences  or  with  an  oligonucleotide  query 
sequence  complementary  to  the  subfamily  diagnostic  base  po¬ 
sitions.  Only  AJu  Ycl/Yc2  elements  with  100%  identity  to  the 
oligonucleotide  query  sequences  or  entire  subfamily-specific 
consensus  sequnce  were  utilized  for  further  analysis.  To  esti¬ 
mate  the  copy  numbers  of  the  Yb9  subfamily  we  searched  the 
dr^t  sequence  of  the  human  genome  (Lander  et  al  2001), 
using  a  subfamily-specific  probe  that  contained  the  Yb9-spe- 
cific  mutation  as  well  as  the  insertion  in  the  Yb8  subfamily.  A 
complete  list  of  the  Alu  elements  identified  from  the  GenBank 
search  is  available  from  M.  A.  Batzer  or  P.  L.  Deininger. 

DNA  samples:  Human  DNA  samples  from  the  European, 
African-American,  Alaskan  Native,  Egyptian,  and  Asian  popu¬ 
lation  groups  were  isolated  from  peripheral  blood  lympho¬ 
cytes  (Ausubel  el  al  1996)  that  were  available  from  previous 
studies  (Roy  et  al  1999). 

Olipnucleotide  primer  design  and  PGR  amplification: 
Flanking  unique  DNA  sequences  acyacent  to  each  Alu  repeat 
were  used  to  design  primers  for  the  Yb9,  Ycl,  and  Yc2  Alu 
elements  (Table  1).  PGR  primers  and  reactions  were  per¬ 
formed  as  previously  described  (Rovtf/af.  1999).  The  heterozy¬ 
gosity  associated  with  each  element  was  determined  by  the 
amplification  of  20  individuals  from  each  of  four  populations 
(African  American,  Alaskan  Native,  or  Asian,  European,  and 
IbO  total  chromosomes).  The  chromosomal  location 
for  elements  identified  from  randomly  sequenced  anonymous 
large-insert  clones  was  determined  by  PCR  as  previously  de¬ 
scribed  (Roy  et  al  1999). 


RESULTS 

The  Alu  Yb9,  Ycl,  and  Yc2  subfamilies:  Analysis  of  a 
setof  243Yb8  Alu  elements  retrieved  from  the  GenBank 
database  allowed  us  to  identify  a  putative  subfamily  con¬ 
taining  all  the  known  Yb8  diagnostic  mutations  plus  one 
new  mutation,  which  is  referred  to  as  Yb9  in  compliance 
with  the  standard  Alu  subfamily  nomenclature  (Batzer 
et  al  1996).  The  Yb9  consensus  sequence  is  shown  in 
Figure  1.  Searches  from  the  nr,  the  htgs,  and  gss  re¬ 
trieved  a  total  of56  Yb9  elements.  Of  these,  25  elements 


were  retrieved  from  the  nr  database  (30.4%  of  the  hu¬ 
man  genome  at  the  time),  giving  an  estimated  size  of 
82  members  for  the  Yb9  subfamily.  This  estimate  is  also 
in  good  agreement  with  a  search  of  the  draft  human 
genomic  sequence  (Lander  et  al  2001)  that  identified 
79  perfect  matches  with  a  Yb9  subfamily-specific  query 
sequence. 

Using  a  different  approach,  we  also  retrieved  one 
previously  identified  subfamily,  Ycl  [formerly  termed 
SbO  (JuRKA  1995)],  and  a  new  variant,  Yc2.  GenBank 
database  searches  for  Alu  Y  elements  that  perfeedy 
match  the  consensus  sequence  brought  several  Alu  Y 
elements  to  our  attention  that  share  one  or  two  specific 
mutations  that  differ  from  the  Y  consensus.  Closer  in¬ 
spection  facilitated  the  retrieval  of  the  additional  Alu 
subfamilies.  BLAST  searches  using  the  consensus  se¬ 
quence  for  Alu  Ycl  and  Yc2  will  also  retrieve  a  large 
number  of  elements  that  are  matches  to  the  Alu  Y  sub¬ 
family  as  well,  making  the  analysis  of  the  elements  identi¬ 
fied  in  this  manner  impractical.  Therefore,  we  selected 
only  the  elements  of  these  subfamilies  with  100%  iden¬ 
tity  to  the  oligonucleotide  query  sequence  that  con¬ 
tained  the  subfamily-specific  diagnostic  bases.  A  total  of 
176  Ycl  (13  perfect  matches  to  the  entire  subfamily 
consensus  sequence)  and  17  Yc2  (11  perfect  matches 
to  the  entire  subfamily  consensus  sequence)  elements 
were  retrieved.  A  count  of  all  Ycl  elements  retrieved  by 
BLAST  on  a  single  initial  search  of  the  nr  database 
yielded  a  total  of  1 16  elements,  giving  an  estimated  copy 
number  of  381  Ycl  elements  in  the  human  genome  (the 
nr  database  contained  30.4%  of  the  human  genome' 
sequence  at  the  time  of  the  search).  Interestingly,  three 
of  the  four  elements  previously  classified  as  Alu  Y  ele¬ 
ments  linked  to  disease  (Deininger  and  Batzer  1999) 
belong  to  the  Alu  Ycl  subfamily  (Figure2):  the  de  novo 
insertion  in  the  Cl  inhibitor  gene  (Clinh;  Stoppa- 
Lyonnet  et  al  1990),  another  de  novo  insertion  in 
BRCA2  (BRCA2;  Miki  et  al  1996),  and  glycerol  kinase 
deficiency  (GK;  Zhang  et  al  2000). 

About  one-half  of  the  56  total  Yb9  elements  (29) 
shared  100%  nucleotide  identity  with  the  subfamily  con¬ 
sensus  sequence.  To  get  an  approximation  of  the  age 
of  the  Yb9  subfamily,  we  evaluated  the  number  of  non- 
CpG  mutations  present  within  the  different  Alu  ele¬ 
ments  as  previously  described  (Roy  et  al  2000).  A  total 
of  19  CpG  mutations,  25  non-CpG  mutations,  and  two 
5'  truncations  occurred  within  the  56  Alu  Yb9  subfamily 
members  identified.  Using  a  neutral  rate  of  evolution 
for  primate  intervening  DNA  sequences  of  0.15%  per 
million  years  (Miyamoto  et  al  1987)  and  the  non-CpG 
mutation  density  of  0.1908%  (25/13,104  bases  using 
only  non-CpG  bases)  within  the  56  Yb9  Alu  elements 
yield  an  estimated  average  age  of  1.27  million  years 
(myr).  The  age  for  the  Yb9  subfamily  members  is  pre¬ 
dicted  at  a  95%  confidence  level  in  the  range  of  0.8— 1.8 

given  that  the  mutations  were  random  and  fit  a 
binomial  distribution.  No  analysis  can  be  made  for  the 
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Figure  1 . — Consensus  se¬ 
quence  alignment  ofY,  YbS, 
and  the  potential  new  sub¬ 
family  Yb9  identified.  Nucle¬ 
otide  substitutions  at  each 
position  are  indicated  with 
the  appropriate  nucleotide. 
Deletions  are  marked  by 
dashes  (-) .  The  YbS  and  Yb9 
diagnostic  nucleotides  are 
indicated  in  boldface  type 
with  the  corresponding  di¬ 
agnostic  numbers  above. 
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Ycl  and  Yc2  Alu  elements,  because  only  subfamily  mem¬ 
bers  with  perfect  identity  to  the  subfamily  consensus 
sequence  or  one  mismatch  were  isolated  from  the  data¬ 
base  using  one  of  the  database  screening  procedures. 

Phylogenetic  distribution  and  human  genomic  diver¬ 
sity  of  the  new  subfamilies:  Amplification  of  the  Yb9, 
Ycl,  and  Yc2  elements  from  nonhuman  primate  ge¬ 
nomes  facilitated  the  analysis  of  the  phylogenetic  distri¬ 
bution  of  these  elements,  using  PCRand  the  oligonucle¬ 
otide  primers  in  Table  1.  The  majority  of  the  elements 
evaluated  were  absent  from  the  genomes  of  the  nonhu¬ 
man  primates,  suggesting  that  these  elements  dispersed 
and  were  fixed  in  the  human  genorne  after  the  human 
and  African  ape  divergence. 

We  performed  a  PCR  analysis  on  a  panel  of  human 
DNA  samples  to  determine  the  levels  of  human  diversity 
associated  with  the  Alu  elements  from  these  new  subfam¬ 
ilies,  using  the  oligonucleotide  primers  shown  in  Table 
1.  The  panel  consists  of  20  individuals  of  European 
origin,  African-Americans,  Asians,  and  Egyptians  for  a 
total  of  80  individuals  (160  chromosomes).  We  were 
able  to  analyze  28  out  of  the  56  Yb9  elements,  97  out 
of  176  Ycl  elements,  and  8  out  of  17  Yc2  AJu  elements, 
using  this  approach.  Several  factors  did  not  allow  for 
analysis  of  ail  the  elements.  Mainly,  we  were  unable  to 
design  appropriate  primers  due  to  insufficient  flanking 
unique  DNA  sequences  or  because  the  element  ana¬ 
lyzed  resided  within  another  type  of  repeat  as  described 
previously  (Batzer  et  al  1991).  The'AJu  elements  were 
classified  as  fixed  present  and  high,  intermediate,  or 
low  frequency  insertion  polymorphisms  (see  Table  1  for 
definitions) .  In  general,  we  observed  that  approximately 
one-fourth  to  one-third  of  the  elements  analyzed  had 
some  degree  of  insertion  polymorphism  (Yb9  with  10/ 


28,  Ycl  with  24/97,  and  Yc2  with  3/8).  The  population- 
specific  genotypes  and  levels  of  heterozygosity  for  each 
element  are  shown  in  Table  2.  The  high  proportion  of 
polymorphic  elements  in  these  AJu  subfamilies  is  in 
good  agreement  with  our  previous  observations,  indicat¬ 
ing  that  these  subfamilies  are  very  recent  in  origin  and 
still  actively  retroposing  within  the  human  genome. 


DISCUSSION 

From  our  subset  of  AluYb8  and  Y  elements,  we  were 
able  to  retrieve  three  AJu  subfamilies  termed  Yb9,  Ycl, 
and  Yc2.  A  schematic  of  the  evolutionary  relationship 
of  these  subfamilies  with  the  previously  defined  AJu 
subfamilies  is  shown  in  Figure  3.  AJu  subfamilies  arise 
as  a  result  of  mutations  occurring  in  an  existing  master 
element  or  new  source  elements  capable  of  significant 
amplification.  In  this  case,  the  new  subfamilies  are  pre¬ 
sumably  examples  of  AJu  subfamilies  that  may  have  origi¬ 
nated  from  the  rare  instances  when  an  AJu  element 
fortuitously  becomes  both  transcriptionally  and  retropo- 
sitionally  active,  therefore  allowing  it  to  be  another  AJu 
source  gene. 

The  young  AJu  subfamilies  are  currently  active  with 
respect  to  retroposition,  whereas  the  older  AJu  subfamil¬ 
ies  typically  are  not.  The  old  Alu  subfamilies  (Sx,  J, 
and  Sgl),  which  comprise  the  vast  majority  (>1,000,000 
copies)  of  the  AJu  elements  present  in  the  human  ge¬ 
nome,  appear  completely  inactive  as  none  of  their  mem¬ 
bers  have  been  associated  with  de  novo  Alu  inserts  that 
result  in  human  diseases  (Table  3).  When  noting  the 
ratio  of  reported  AJu  insertions  associated  with  diseases 
and  the  estimated  size  of  the  Alu  subfamily,  the  younger 
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Figure  2. — Consensus  se¬ 
quence  alignment  of  Y,  Ycl, 
Yc2,  and  three  Alu  Ycl  ele¬ 
ments  associated  with  dis¬ 
ease.  The  diseases  linked 
withYcl  Alu  elements  are 
the  angioedema  caused  by 
a  de  novo  insertion  in  the 
Cl  inhibitor  gene  (Clinh; 
Stoppa-Lyonnet  ei  oL  1990), 
breast  cancer  with  another 
de  novo  insertion  in  BRCA2 
(BRCA2;  Miki  et  al  1996), 
and  glycerol  kinase  defi¬ 
ciency  (GK;  Zhang  et  al 
2000).  Nucleotide  substitu¬ 
tions  at  each  position  are 
indicated  with  the  appro¬ 
priate  nucleotide.  Deletions 
are  marked  by  dashes  (-). 
The  diagnostic  nucleotides 
are  indicated  in  boldface 
type  with  the  corresponding 
diagnostic  numbers  above. 
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subfamilies  Ya5,  Yb8,  and  Ycl  currently  appear  to  be 
'^1000  times  more  active  than  the  AJu  Y  subfamily  with 
7/2640, 3/1852,  and  3/ 400  compared  to  1/200,000  (Ta¬ 
ble  3).  The  Alu  Ya5a2  subfamily  appears  to  have  even 
a  higher  current  retroposition  rate  (1/40),  but  the  very 
young  age  and  small  size  of  the  subfamily  may  be  an 
influencing  factor.  In  general,  two  independent  obser¬ 
vations  support  the  current  mobility  of  these  young  Alu 
subfamilies  within  the  human  genome.  First,  there  are 
examples  of  Alu  inserts  that  have  caused  disease  that 
belong  to  these  young  subfamilies.  Second,  the  subfami¬ 
lies  have  a  high  proportion  of  Alu  insertion  polymor¬ 
phisms  between  individuals/populations  (Table  3),  in¬ 
dicating  the  recent  proliferative/amplification  activity 
of  these  Alu  elements  in  the  human  genome. 

Alu  elements  that  are  polymorphic  for  insertion  pres¬ 
ence/absence  have  previously  proven  useful  for  the 
study  of  human  population  genetics  and  forensics 
(Batzer  etal  1991, 1994;  Perna  etal  1992;  Nov iCK  et  al 
1993;  Hammer  1994;  Tishkoff  etal  1996;  Stoneking  et 
al  1997;  Majumder  et  al  1999;  Comas  et  al  2000;  Jorde 
et  al  2000;  Watkins  et  al  2001).  The  identification  of 


very  young  Alu  subfamilies  with  a  high  proportion  of 
polymorphic  members  provides  new  sources  of  Alu  in¬ 
sertion  polymorphisms  for  the  study  of  human  popula¬ 
tion  genetics.  However,  it  is  important  to  note  that  an 
exhaustive  analysis  of  these  small  subfamilies  will  only 
generate  a  relatively  small  number  of  new  Alu  insertion 
polymorphisms. 

Master  element  vs,  source  gene:  Alu  elements  have 
been  proposed  to  fit  an  evolutionary  model  where  the 
copies  arose  from  “master”  genes  (Deininger  and  Sla- 
GEL  1988;  Labuda  and  Striker  1989;  Shen  ei  al  1991; 
Deininger  et  al  1992).  A  master  gene  can  be  defined 
as  an  element  that  is  highly  active  during  a  long  period, 
therefore  generating  a  lot  of  copies  of  itself.  However, 
we  demonstrated  that  recently  inserted  Alu  elements 
{de  novo)  belong  to  a  variety  of  Alu  subfamilies,  indicat- 
ing  the  simultaneous  presence  of  multiple  active  ele¬ 
ments  in  the  human  genome.  These  active  elements 
that  have  a  low  rate  of  amplification  and  are  only  active 
for  a  very  short  period  of  time  should  not  be  classified 
as  master  genes.  To  distinguish  between  them,  we  sug¬ 
gest  the  use  of  the  nomenclature  of  “master  gene”  when 


? 


Alu  Insertion  Polymorphism 


<  x: 


12 

I 


o 

a 

bo 


M 

TS 

V 

« 

V 

o 


a 


I 


& 

o 

c 

<5 


o 

o 

o 

d 


CTJCJOOOCnO'^'^OC'XiOOOOlO 

(D^000^l0c30’^r>-'««^»“i0000»0 

00xt|OO*-HC4«-Hr-4OOOOOOO 

odddddddddddddd 


•— ctOOOtO»-<tDI>OCOoOOO 
co(Nr^r-<comoiOcnr-(r-io 
OOOC^Of-jOOOTt^OCOO 

dddddddddddd 


OOTf'^^OOOCNOXO^OOOOOOOOOOOCvfOOoOOTf^OCMO 

^^^^^oooooooododdddddddddddddd 

oox#^cooooo»-Hoqi-H<>{ooo'^oooiocsr*-HrHTj<0(NoyDoooo 

of—'oooooooooooodddddddddt-'dddi— <dd 


<M  O  lO  tso 


OOOO^OOOOOOOOt^OOOOCviCNlOf-Ht^OOCDI>Ocr)OCMO’^0 

•— «  I— c  I— I  ^ 

0<OtDOOOO.~«.-«j£)OOOOOi-HiOOTj^CviOOoriOOOOOlOOOTj^O 

i2|gi§“i^^i§s°§§°§-§ssgsg§2gg§ 

o^(N'^oq-HCsjr-jr-«qqqqqqqcooocMOCM^?o»oo^o 

oooooooooooooooooddddddddddddd 

i|S$i|§S2?ggS§§§§K§SS25:§^^§§§{28 

qcnr-.cMqo^r-<oo^CM^ooqqiorj^o^SooSwwS?o§o£o 

ooooooooooooooddddddddddddd—Idd 


'cooa>oocvfcoooomoooocr)a>ooot-itoin-rf<int>^ooo 

r-<  rH  -H  Cvl 


O^COOOOOOi-HinOf-HOOOOCM''^<0.-ilOCOO<N 


CO  O  O  O  00  o 
csr 


§ilSiiiSiiiiiiiSiiiiiSiiiisiS| 

®®^^^^^^ooodoodooddddddddddddd 

iisiiisigiigsiiiiisgssiisiiiii 

Oi-<ooooooor-<oooooodddddddi-4dddi-H'od 


o  o 


O^OOt'-OOCOOCM^^OOOrf^OCMOiiOin’C^OOOlOO 


^  o 


02^<^^oc>c:>oooocooo(N’«tf^oooco.-cooocncooTrcnt£)o 


CO  q  iO  oO  O  ^0  •—*  OO  O  CVf  O  O  O  ^0  O  O  r—*  o  CD 

odddddddddddddd odd do 

oqco.>-<qqqoqt^kOioioooooi-«oico 
oq^t^oco(Nr^<Mooooioo^r-'co 
^  o  tN  q  q  q  r«H  q  q  os[  ©fj  q  q  ^  ^  ^  ^  ^ 

Or-toddddddddddddddddd 


OOOOOOOCsiOOO 
OOC^iXT^Or— 'IDiOO 

qqqqqmooo 

ddddddddd 

oooooomvoioo 

qqoioooxr>.t'-o 

qqqrHqiDOjQio 

ddddddddd 


o  o  cvr  Tj<  lo  o 

<N  rH  I— t  t-H 


'-'locsrc^cooooooot^oooo- 

r— t  »— t  r—t  r-l  f-H  •— I  r<H 


■  CO  O  If)  OO  O 

r-t  ^  CVf 


1  o  O  00 


oo*~<yooo*-Hr- (oor^oocoooo<M'rf<t>.c7JOOooTr‘of^»— It— .o 


oa>cor-<oooc40oo 


iTX<OOt-«OOOOOOOOtOi-HO(NCnCJ^O 


00  ^ 
o  o  u 


C0r-(lf)Or-^00O00'^^Ql  ^ 

_ ^--CJOOUOo8uoSuuoSSSSnSn9°°®^^^^§ 

§§§g§§§§§g2S22555225255“««2«§S 


9 


♦ 

\ 


10 

t- 


A.  M.  Roy-Engcl  el  ai 


S 


•a 

o 

3 

■i 

c 

o 

o 


bo  - 

<  x: 


fxj 


I 

\ 

I 

I 

\ 

+ 

+ 

\ 

+ 


<£)  O  to  r-H  tv. 

CV[  CD  O  to  CD  Tj' 

O  O  CD  ^  O  O  CD 

CD  CD  O  CD  <D  O  CD 

O  O  O  CO  iO 
O  O  O  •— <  CTi  00  00 

CD  O  O  lO  O  O  O 

CD  O  CD  O  o’  O  CD 

O  O  O  O  r-H  CM  (W 

O  O  O  O  CM  00 

CD  O  O  iq  00  00  00 

r-I  CD  O  O  CD  O 


O  O  O  O 
CN 


O 


O  O  O  CTi  CO  ^ 


CO  lO  O  O  O  CO 


o  o  i-H  ir>  ,--4 

O  O  O  O  CO 

•-<  o  CD  m  o  o 

O  CD  O*  CD  CD  CD 


^  o  o  lo  ^  m  o 

O  O  (M  CD  CVJ  O 
o  o  CD  rtj  o  CD  CO 
CD  CD  O  o’  CD  O 


O  O  O  CO  O  O  CO 
CM 


CM  O  O  »-i  CO  O 


00  o  o  CO 


o  o  o  00  r-.  o 

O  O  O  iO  CM  CO  o 

CD  CD  CD  oq  O  O  O 

CD  CD  O  o  CD  CD  (D 


O  O  O  lO  CD  O  O 

O  O  O  CM  CM  o  O 

p  CD  CD  CM  O 

'  r-H  o  CD  CD  CD 


O  O  O  .-H  lO  ^  O 
04  I—* 


O  O  O  CD  CO  O  O 


a 

C 

OJ 

H 

w 


CD  O  O  CO  CO  00 


O  CM  O  CM  •— t  o  rv 

O  CO  O  CD  CO  tv. 

P  04  O  P  O  o 

CD  CD  O  O  CD  CD  CD 


P  O  O  O  IT)  -H 

o  m  o  o  rv  c 
O  p  CD  Tf<  oo  CD 

O  CD  CD  O  CD  <D 


O  04  O  CO  o 
04 


c 

o 

•a 

a 

3 

Q. 

O 

Du 

C 


ftj 

•S 

E 

o 

thj 

c 

CTJ 

-s 


•5 

C 


.^1 
Vi  c 

SjS 

O  cc! 


C  c3 

'B 

1  ^ 

S  O 

CX.  u 

V 

•Sz 

tw  Q 

^  bo 
o  c 
.  50  •« 

.^e  3 

o  > 
bo  ca 

V 

2S 

V 


'O 

CD 

c 

o 

<u 


O  04  O  CO  Tf  o  CO 


1  CO  <V\  IT'^ 


CD 


<U 

x: 

•o 

a; 

!o 

c 

3 

o 

•S' 


>v  (Jj 

.3  »-■ 


O  ^ 
bo  ^ 

O  32 
u  rt 

QJ  X 


o  o  CM  ^  r? 

«“<<—'  ►v,  *v^  CJ  U  O 

O  O  O  O  CQ  CQ  03 

(KZ.'ZZ. 

—  CM  04  CM 

^  ^  ^  .u 

^  > 


V  :3 

^  c: 

c  ^ 

^  G 

<  ^ 
-  w 


Yb9  0,*+- 

\ 

Yb8  3.^ 


Ya8  0,++ 

\ 

\ 

\ 


\ 

\ 

\ 

\ 

\ 

\ 

\ 

\ 


•  Ya5a2  1, 

/ 

/ 

Ya5  7, 1 

I 

I 

I  Yc2 

I  Ycl  3.+ 

I  / 


I  i  -f 


o,+ 


Figure  3. — Schematic  diagram  of  the  evolution  of  recently 
integrated  Alu  subfamilies.  All  the  origins  of  the  young  Alu 
subfamilies  are  shown.  The  origins  of  the  Yb9,  Ycl,  and  Yc2 
Alu  subfamilies  are  shown  after  the  divergence  of  the  Yb8  and 
the  Y  subfamily,  respectively.  The  size  of  the  font  is  reladve 
to  the  number  of  elements  within  each  subfamily,  die  largest 
representing  100,000—200,000  copies;  medium,  1000-2000 
copies;  and  the  smallest,  50-500  copies.  The  total  number  of 
elements  from  each  subfamily  linked  to  disease  is  indicated 
to  the  right.  The  propordon  of  polymorphic  elements  within 
each  family  is  represented  by  the  following:  ±,  rarely  polymor¬ 
phic  elements  are  found;  +,  low  percentage  of  polymorphic 
elements;  ++,  ~50%  the  elements  are  polymorphic;  and 
+  ++,  most  of  the  elements  are  polymorphic. 


referring  to  the  highly  acdve  genes  for  long  evolutionary 
periods  of  time,  like  the  Alu  element  that  generated 
the  m^'ority  (>90%)  of  the  Alu  elements  currendy  pres¬ 
ent  in  the  genome  today.  For  those  copies,  or  daughters, 
that  acquired  the  ability  to  retropose  we  propose  the 
use  of  the  term  “source  genes.”  However,  some  of  the 
elements  classified  as  source  genes  may  be  potendal 
master  genes,  and  only  the  progression  of  time  will  allow 
the  appropriate  disdncdon  to  be  made. 

Evolutionary  reduction  in  the  Alu  retroposition  rate: 
Our  data  indicate  the  existence  of  several  currendy  ac¬ 
dve  Alu  elements  that  belong  to  different  subfamilies 
within  the  human  genome.  However,  the  present  ampli¬ 
fication  rate  of  Alu  elements  has  drastically  decreased 
from  when  it  reached  its  peak  35  and  60  million  years 
ago  (mosdy  Sx  subfamily).  The  m^ority  of  the  Alu  ele¬ 
ments  present  in  the  genome  of  extant  humans  inserted 
during  this  peak  amplification  period.  There  are  multi¬ 
ple  reasons  that  could  explain  the  reduction  in  the 
amplification  rate’  of  Alu  elements.  First,  mutations 
within  or  near  the  master  Alu  element  could  reduce  its 
retroposition  activity  or  even  totally  abolish  it  by  a  variety 
of  mechanisms  (Deininger  and  Batzer  1993;  Schmid 
1996).  Alternatively,  mutations  within  the  master  gene 
or  in  the  LINE  elements  that  affect  the  ability  to  “parasit¬ 
ize  LINE  element-encoded  enzymes  necessary  for  retro¬ 
position  could  also  reduce  the  Alu  amplification  rate. 
Furthermore,  the  host  may  have  also  evolved  cellular 
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TABLES 

>•» 

Young  Alu  subfamilies  copy  number,  inserts  linked  to  disease, 
and  polymorphism 


Alu 

subfamily 

Esdmated 
copy  number 

Inserted 

linked 

with 

disease" 

General 

subfamily 

polymorphism* 

(%) 

J.  Sx,  Sgl 

>1,000,000 

0 

Y 

>200,000 

1 

± 

Ya5 

2640 

7 

+  26 

Ya5a2 

40 

1 

+  +  +  80^ 

Ya8 

70 

0 

+  +  50 

Yb8 

1852 

3 

+  20 

Yb9 

80 

0 

+  36 

Ycl 

400 

3 

+  25^ 

Yc2 

ND 

0 

+  37.5‘ 

ND,  not  determined. 

"Previously  published  Alu  elements  linked  with  disease 
(Deininger  and  Batzer  1999). 

*  The  proportion  of  polymorphic  elements  within  each  fam¬ 
ily  is  represented  by  the  following:  ±,  rarely  polymorphic 
elements  are  found;  -f,  low  percentage  of  polymorphic  ele¬ 
ments;  '^50%  the  elements  are  polymorphic;  and  4-  +  -H, 
most  of  the  elements  are  polymorphic. 

^  Percentage  polymorphism  was  determined  using  a  selected 
subgroup  introducing  a  bias. 


mechanisms  to  reduce  Alu  proliferadon.  Finally,  the 
availability  of  suitable  genomic  “inserdon  sites”  may  be 
reduced,  since  most  evoludonarily  neutral  or  posidve 
sites  are  presumably  already  “filled”  with  different  types 
of  preexisdng  repeats.  Alternadvely,  new  Alu  inserdons 
may  result  in  unacceptable  local  levels  of  unequal  homo¬ 
logous  recombinadon  (Deininger  and  Batzer  1999). 

AMR  was  supported  by  a  Brown  Foundation  fellowship  from  the 
Tulanc  Cancer  Center.  This  research  was  supported  by  National  Insti¬ 
tutes  of  Health  ROl  GM45668  (P.L.D.);  Department  of  the  Army 
DAMDl 7-98-1 -8 119  to  (P.L.D,  and  MA.B.);  Louisiana  Board  of  Re¬ 
gents  Millennium  Trust  Health  Excellence  Fund  HEF  (2000-05)-05 
and  HEF  (200CM)5)-01  (MA.B.  and  P.L.D.);  and  av\ard  1999-IJ-CX- 
K009  from  the  Office  ofjustice  Programs,  National  Institute  ofjustice, 
Department  ofjustice  (MA.B,).  Points  of  view  in  this  document  are 
those  of  the  authors  and  do  not  necessarily  represent  the  official 
position  of  the  U.S.  Department  ofjustice. 


LITERATURE  CITED 

Altschul,  S.  F.,  W.  Gish,  W.  Miller,  E.  W.  Mvers  and  D.  J.  Lipman, 
1990  Basic  local  alignment  search  tool.  J.  Mol.  Biol.  215;  403-^ 
410. 

Ausubel,  F.  M.,  R.  Brent,  R.  E,  Kingston,  D.  D.  Moore,  J.  G., 
Seidman  et  at,  1996  Current  Protocols  In  Molecular  Biolo^.  lohn 
Wiley  &  Sons,  Canada. 

Batzer,  M.  A,  and  P.  L.  Deininger,  1991  A  human-specific  subfam¬ 
ily  of  Alu  sequences.  Genomics  9:  481-487. 

Batzer,  M.  A,  V.  A  Gudi,  J.  C.  Mena,  D.  W,  Foltz,  R.  J.  Herrera 
et  ai,  1991  Amplification  dynamics  of  human-specific  (HS)  Alu 
family  members.  Nucleic  Acids  Res.  19:  3619-3623. 

Batzer,  M.  A.,  M.  Stonekinc,  M.  Alegria-Hartman,  H.  Bazan, 
D.  H.  Kass  et  ai,  1994  African  origin  of  human-specific  polymor¬ 
phic  Alu  insertions.  Proc.  Natl,  Acad.  Sci.  USA  91;  12288-12292. 

Batzer,  M.  A.,  C.  M.  Rubin,  U.  Hellmann-Blumberg,  M.  Alegria- 
Hartman,  E.  P.  Leeflanc  et  ai,  1995  Dispersion  and  insertion 


polymorphism  in  two  small  subfamilies  of  recently  amplified  hu¬ 
man  Alu  repeats.  J.  Mol.  Biol.  247:  41^-427. 

Batzer,  M.  A.,  P.  L.  Deininger,  U.  Hellmann-Blumberg,  J.Jurka, 
D.  Labuda  et  ai,  1996  Standardized  nomenclature  for  Alu  re¬ 
peats.  J.  Mol.  Evol.  42:  3-6. 

Comas,  D,,  F.  Calafell,  N.  Benchemsi,  A.  Helal,  G.  Lefranc  et  at, 
2000  Alu  insertion  polymorphisms  in  NW  Africa  and  the  Iberian 
Peninsula:  evidence  for  a  strong  genetic  boundary  through  the 
Gibraltar  Straits.  Hum.  GeneL  107:  312-319. 

Deininger,  P.  L,  and  M.  A.  Batzer,  1993  Evolution  of  retroposons. 
pp.  157-196  in  Evolutionary  Biology,  edited  by  M.  K.  Heckht,  et 
al  Plenum  Publishing,  New  York. 

Deininger,  P.  L.,  and  M.  A.  Batzer,  1999  Alu  repeats  and  human 
disease.  Mol,  GeneL  Metab.  67:  183-193. 

Deininger,  P.  L.,  and  V.  Slagel,  1988  Recently  amplified  Alu  family 
members  share  a  common  parental  Alu  sequence.  Mol.  Cell  Biol 
8:  4566-4569. 

Deininger,  P.  L.,  M.  A.  Batzer,  C.  A.  Hutchison  and  M.  H.  Edgell, 
1992  Master  genes  in  mammalian  repetidve  DNA  amplification.* 
Trends  GeneL  8:  307-311. 

Hammer,  M.  F.,  1994  A  recent  insertion  of  an  Alu  element  on  the 
Y  chromosome  is  a  useful  marker  for  human  population  studies 
Mol.  Biol.  Evol.  11;  749-761. 

JORDE,  L.  B.,  W.  S.  Watkins,  M.  J.  Bamshad.  M.  E.  Dixon,  C.  E, 
Ricker  et  al,  2000  The  distribution  of  human  genetic  diversity: 
a  comparison  of  mitochondrial,  autosomal,  and  Y-chromosome 
data.  Am.  J.  Hum.  GeneL  66:  979-988. 

JURKA,  J.,  1995  Origin  and  evolution  of  Alu  repetitive  elements,  pp. 
25-42  in  The  Impact  of  Short  Interspersed  Elements  (SINEs)  on  the  Host 
Genome,  edited  by  R.  J.  Maraia.  R.  G.  Undes  Company,  Austin, 
Texas. 

Labuda,  D.,  and  G.  Striker,  1989  Sequence  conservation  in  Alu 
evolution.  Nucleic  Acids  Res.  17;  2477-2491. 

Lander,  E,  S.,  L  M.  Linton,  B.  Birren,  C.  Nusbaum,  M.  C.  Zody  et 
al,  2001  Initial  sequencing  and  analysis  of  the  human  genome 
Nature  409:  860-921, 

Majumder,  P.  P.,  B.  Roy,  S.  Banerjee,  M.  Chakraborty.  B.  Dey  et 
al,  1999  Human-specific  inscrtion/dclction  polymorphisms  in 
Indian  populations  and  their  possible  evolutionary  implications. 
Eur,  J.  Hum.  GeneL  7;  435-446. 

Miki,  Y.,  T.  Katagiri,  F.  Kasumi,  T.  Yoshimoto  and  Y.  Nakamura, 
1996  Mutadon  analysis  in  the  BRCA2  gene  in  primary  breast 
cancers.  NaL  GcncL  13:  245-247. 

Miyamoto,  M.  M., J.  L.  Suchtom  and  M.  Goodman,  1987  Phyloge- 
nede  rcladons  of  humans  and  African  apes  from  DNA  sequences 
in  the  psi  cta-globin  region.  Science  238:  369-373. 

Novick,  G.  E.,  T.  Gonzalez, j.  Garrison,  C.  C.  Novick,  M.  A.  Batzer 
et  al,  1993  The  use  of  polymorphic  Alu  inserdons  in  human 
DNA  fingerprindng.  Exper.  Suppl.  67:  283-291. 

Perna,  N.  T.,  M.  A.  Batzer,  P.  L.  Deininger  and  M.  Stoneking,  1992 
Alu  inserdon  polymorphism:  a  new  type  of  marker  for  human 
populadon  studies.  Hum.  Biol.  64:  641-648. 

Roy,  a.  M.,  M.  L  Carroll,  D.  H.  Kass,  S.  V.  Nguyen,  A.-H.  Salem 
et  al,  1999  Recendy  integrated  human  Alu  repeats:  finding  nee¬ 
dles  in  the  haystack.  Genedca  107:  149-161. 

Roy,  a.  M.,  M.  L.  Carroll,  S.  V.  Nguyen,  A.-H.  Salem,  M.  Oldridce 
et  al,  2000  Potendal  gene  conversion  and  source  gcnc(s)  for 
recendy  integrated  Alu  elements.  Genome  Res.  10:  1485-1495. 

Schmid,  C.  W.,  1996  Alu:  structure,  origin,  evoludon,  significance 
and  funcuon  of  one-tenth  of  human  DNA.  Prog.  Nucleic  Acid 
Res.  Mol.  Biol.  53:  283-319, 

Shen,  M.  R.,  M.  a.  Batzer  and  P.  L,  Deininger,  1991  Evoludon  of 
the  master  Alu  gene(s).J.  Mol.  Evol.  33:  311-320. 

Smit.  a.  F.,  1996  The  origin  of  interspersed  repeats  in  the  human 
genome.  Curr.  Opin.  GeneL  Dev.  6:  743-748. 

Stoneking,  M.,  J.  J.  Fontius.  S.  L.  Clifford.  H,  Soodyall,  S.  S. 
Argot  et  al,  1997  Alu  inserdon  polymorphisms  and  human 
evoludon:  evidence  fora  larger  populadon  size  in  Africa.  Genome 
Res.  7:  1061-1071. 

Stoppa-Lyonnet.  D..  P.  E.  Carter,  T.  Meo  and  M.  Tosi,  1990  Clus¬ 
ters  of  intragenic  Alu  repeats  predispose  the  human  Cl  inhibitor 
locus  to  deleterious  rearrangements.  Proc.  Nad.  Acad.  Sci.  USA 
87:  1551-1555. 

Tishkoff,  S.  a,,  G.  Ruano, j.  R.  Kidd  and  K.  K.  Kidd,  1996  Distribu- 
don  and  frequency  of  a  polymorphic  Alu  inserdon  at  the  plasmin¬ 
ogen  aedvator  locus  in  humans.  Hum.  GeneL  97:  759-764. 


12 


A.  M.  Roy-Engcl  et  aL 


Vy ATKINS,  W.  S..  C.  E.  Ricker,  M.  J.  Bamshad,  M,  L.  Carroll.  S.  V. 
Nguyen  2001  Patterns  of  ancestral  human  diversity:  an 
analysis  of  Alu-insertion  and  rcstiiction^ite  polymorphisms.  Am. 
J.  Hum.  Genet  68:  738-752. 

Zhang,  Y.,  K,  M.  Dipple,  E.  Vilain,  B.  L.  Huang,  G,  Finlayson 


et  aL,  2000  AluY  insertion  (IVS4-52ins316alu)  in  the  glycerol 
kinase  gene  from  an  individual  with  benign  glycerol  kinase  defi¬ 
ciency.  Hum.  Mutat  15:  316-323. 

Communicating  editor  Y.-X.  Fu 


